A98 changes to Binary!
The A98 release will begin finalizing changes to the binary datatype. As you know, this datatype was significantly disrupted due to the addition of Unicode.
Essentially in R3 you must always remember one thing: bytes are not characters.
In other words, if you convert to binary a string that has a length of 5, the resulting binary length could be 5, 6, 7, 8, or more. That's because the string becomes UTF-8 binary encoded.
As we've discussed in several other articles here, this change has caused us to re-examine the meaning of binary! Perhaps most importantly, we've removed most of the "magic" that was being done in binary-related operations.
For example, if you insert an integer 8 into a binary string, it's inserted as a byte of value 8. Unlike R2, it is not converted to ASCII. The same is true with picking a single element from a binary. It's just an integer.
Ok, with that being said, we do allow several special conversions. For example, if you attempt to insert a string into a binary, by default it will be UTF-8 encoded. If you want some other encoding, then you'll need to add an extra step to do that conversion first. (We will be providing some standard codecs too.) That is a reasonable approach.
Anyway, in A98 we'll be extending what you can do with binary, including fixing CureCode #1452 (limited binary usage.) But, of course, we need you to test it really well. (Unicode doesn't just make the R3 internal code for string handling twice as complicated, it makes it ten times more complicated, mainly because UTF-8 encoding is variable length. So, we need you to do extensive testing on it.)
We will begin updating the REBOL 3 Binary Datatype documentation page to cover the definitions and details. (Be sure to reload the page to get newest changes.)
10 Comments
|