An Unicode reflexion

Jeff Blakeney · April 21, 2011, 12:41:14 AM

Quote from: Frederick J. Harris on April 20, 2011, 04:34:49 PM
I fought the good fight for many years against wide character strings too Jeff. And it was a hard fight, and it lasted for many years (since around 2000 I'd say). Here a year or two ago I finally decided to surrender. Its a fight that can't be won. Of course my battles were all waged on C and C++ battlegrounds and mostly coding for Windows CE. But I've surrendered and life is better now.

Well, you may have less conflict with supporters of Unicode but that doesn't mean things actually got better.

QuoteThe main issue for me isn't international support. Switching to unicode (I actually prefer the term 'wide character string', but its longer) has merit simply based on the idea that since Windows NT in the early 90s Windows operating systems have internally worked with the two byte character set. I hear in the Linux world four bytes are used for characters. In any case, I've always suspected what Jose described above where the use of ansi would likely increase memory usage and 'heat' at the OS level rather than minimize it.

Like I said earlier, I have no control over what Windows does. An API call could sit and do nothing for 2 seconds before actually doing something, it could make 10 copies of the data I give it, it could translate my english language strings into latin. I have no idea what it does or have any control over it. As long as I get back what I need in the format I need it, I'm fine with that. Microsoft could change things again tomorrow so that it no longer uses Unicode which could mess up the potential benefit of calling the API using Unicode as well. I say "potential" because, as I said, I don't know the internal workings of Windows and can't say for sure that it does things using Unicode or that the ANSI API statements are just wrappers for the Unicode versions. It is probably documented somewhere but I've never looked it up.

QuoteI know you are a C coder too, and I have to say the use of unicode in PowerBASIC seems to be considerably cleaner than in C/C++. For example, I'm sure that you, like I, have a good many of the C runtime functions memorized such as strcpy(), printf(), strcat(), etc., etc., etc. It became horrendous to use the tchar.h macros such as _tcscpy(), _ftprintf(), _T("Hello, World!"), TEXT("Hello, World!"), L"Hello, World", etc., etc., etc. Some of them are so ugly one has to constantly be looking them up. The only way it was solved was for Microsoft to create a new language ( C# ) and that way eliminate compatability issues with legacy code. In other words, just start out fresh with everything as a wide char string.

Actually, I'm not a C programmer. I learned C in college after they taught us 8086 assembly language because they felt it was easier for people to learn assembly than to learn C and I tend to agree with them.

I used BASIC and assembly for all my Apple II programming and didn't really start programming PCs until my brother got PowerBASIC for DOS and contracted me to do some work for him. I can translate C code if needed but only with the help of google searches to remind me what all those cryptic symbols mean. I certainly don't program anything from scratch in C.

QuoteI really think PowerBASIC's implementation is absolutely as good as it can be without totally abandoning the language and starting out with a new one.

I agree, I think PB has pretty much seamlessly added support for Unicode and, as I said earlier, I'm glad its there for when I might need it. At present, I'm a hobby programmer and write stuff for myself and have no need for more than 7 bit ASCII so 8 bit characters are fine for me. I'm hoping to write some code to share/sell at some point and I'll most likely need to add Unicode support then so it is nice to know it is going to be easy to add.

Theo Gottwald · April 27, 2011, 07:03:01 AM

QuoteVisual Basic—The designers had to make some tough decisions about how they would represent strings internally. They might have chosen ANSI, because it's the common subset of Windows 95 and Windows NT, and converted to Unicode whenever they needed to deal with OLE. But since Visual Basic 4.0 is OLE inside and out, they chose Unicode as the internal format, despite potential incompatibilities with Windows 95. The Unicode choice caused many problems and inefficiencies both for the developers of Visual Basic and for Visual Basic developers—but the alternative would have been worse.

Seen like this, the new "AS WSTRING" should make it even easier to call a OB DLL from VB because no more conversion is needed - any experts on VB here?

News:

An Unicode reflexion

Jeff Blakeney

Theo Gottwald