This article is important and worth reading: we need to finalize the read and write functions this month.
Update: please read my additional notes posted in the comments section.
Quick review
In R3, read and write are entirely different. Unlike R2, these functions no longer use the series-based access model. They use a more traditional stream model, as you'd find in most other languages. As a result, functions like read-io and write-io are no longer needed. You can do those actions with read and write.
One way of thinking of these changes is that read and write are lower level. They basically transfer raw bytes to and from I/O ports.
However, there are some very common usage patterns, especially for file I/O, that we want to support. For example, because we often read text files, we allow the read function to be a little higher-level:
text: read/string %document.txt
This is higher-level because it includes decoding of the bytes into Unicode characters and conversion of CRLF to just LF. It also examines the data to determine if a BOM (byte code marker) is present to properly handle both UTF-16 encodings.
In addition, we also support:
write %document.txt text
And, when text is a string, the file data will automatically be string encoded using UTF-8 standard encoding. In addition, the line terminators will be expanded to the local format, such as CRLF on Windows (but not on Linux and others.)
Finishing up
With that said, it needs to be decided what other ease-of-use actions we need for read and write.
For example, I previously suggested the /as refinement that would allow:
text: read/as %doc.txt 'utf-16be
or the alternate form:
text: read/as %doc.txt 16
and:
write/as %doc.txt text 'utf-16le
or the alternate form:
write/as %doc.txt text 16
The /as refinement let's you specify the encoding of the string.
In addition, we must decide if the Unicode BOM should be written, and what line terminations are needed.
In summary, we must be able to specify:
UTF encoding
BOM present
line termination
That can be done with a function spec like:
write ... /as utf /bom /with eol
But, of course we're adding two more refinements... to a function that we will often want to use at a low-level with high performance. It's probably not going to make much difference, but it's something we want to recognize.
If we don't want to add these refinements, the alternative isn't that pretty either. We'd need to accept a micro-dialect (non-reduced) that specifies the options:
write/as file text [utf-16be bom crlf]
and we'd probably also allow the variation where UTF is integer and the line terminator is a string or character:
write/as file text [16 bom "^m^j"]
And, I probably should mention that the write defaults for /as would be UTF-8, BOM, and local-style terminators.
Think about it
Again... none of this matters for custom I/O where you're handling the bytes yourself. This is only for the cases where the entire I/O is handled in a single call to read or write -- the high-level ease-of-use action.