Unicode

From NARS2000
Revision as of 02:15, 24 January 2015 by Robert Wallick (talk | contribs) (link to ⎕AV page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Character Array and Name Storage

All character arrays and names (variable, function, and operator) are stored as one 16-bit word per character. This fixed length encoding is called UCS-2 and is a subset of a more general encoding called UTF-16. The latter is a variable length encoding using one or two 16-bit words per character.

UCS-2 represents characters in the range U+0000 through U+FFFF; UTF-16 represents characters in the range U+0000 through U+10FFFF. However, because of the way UTF-16 represents characters above U+FFFF, the range of code points for both encodings exclude U+D800 through U+DFFF (2,048 characters).

Thus, UCS-2 encodes 63,488 (=65,536 - 2,048) different characters, and UTF-16 encodes 1,112,064 (=1,114,112 - 2,048) different characters.

Alphabet for Names

The alphabet used for names consists of an initial character followed by one or more subsequent characters.

  • An initial character is one of a though z, A through Z, delta (), or delta underbar ().
  • A subsequent character is a leading character, 0 through 9, overbar (¯), or underbar (_).

One other set of characters, the underbarred alphabet ( through ), may be pasted into a session or function editor window. There is no way to enter these characters directly from the keyboard. Depending on a User Option setting, when these characters are pasted into a session or function editor window, they are treated as themselves or are mapped to the lowercase alphabet. When used in a name, they are always equivalent to the corresponding lowercase letter, although they display as themselves. Because of this latter translation, they may be used as either a leading or subsequent character in a name. Thus the names lh and alpha display differently, but they both refer to the same object; a value assigned to one is reflected in the other.

See also: Quad AV = niladic system function ⎕AV - Atomic Vector page.