Unicode: Difference between revisions
Sudleyplace (talk | contribs) |
(link to ⎕AV page) |
||
(One intermediate revision by one other user not shown) | |||
Line 2: | Line 2: | ||
All character arrays and names (variable, function, and operator) are stored as one 16-bit word per character. This fixed length encoding is called UCS-2 and is a subset of a more general encoding called [http://en.wikipedia.org/wiki/UTF-16 UTF-16]. The latter is a variable length encoding using one or two 16-bit words per character. | All character arrays and names (variable, function, and operator) are stored as one 16-bit word per character. This fixed length encoding is called UCS-2 and is a subset of a more general encoding called [http://en.wikipedia.org/wiki/UTF-16 UTF-16]. The latter is a variable length encoding using one or two 16-bit words per character. | ||
UCS-2 represents characters | UCS-2 represents characters in the range U+0000 through U+FFFF; UTF-16 represents characters in the range U+0000 through U+10FFFF. However, because of the way UTF-16 represents characters above U+FFFF, the range of code points for both encodings exclude U+D800 through U+DFFF (2,048 characters). | ||
Thus, UCS-2 encodes 63,488 (=65,536 - 2,048) different characters, and UTF-16 encodes 1,112,064 (=1,114,112 - 2,048) different characters | Thus, UCS-2 encodes 63,488 (=65,536 - 2,048) different characters, and UTF-16 encodes 1,112,064 (=1,114,112 - 2,048) different characters. | ||
==Alphabet for Names== | ==Alphabet for Names== | ||
Line 13: | Line 13: | ||
One other set of characters, the underbarred alphabet (<apll>{A_}</apll> through <apll>{Z_}</apll>), may be pasted into a session or function editor window. There is no way to enter these characters directly from the keyboard. Depending on a User Option setting, when these characters are pasted into a session or function editor window, they are treated as themselves or are mapped to the lowercase alphabet. When used in a name, they are always equivalent to the corresponding lowercase letter, although they display as themselves. Because of this latter translation, they may be used as either a leading or subsequent character in a name. Thus the names <apll>{A_}l{P_}h{A_}</apll> and <apll>alpha</apll> display differently, but they both refer to the same object; a value assigned to one is reflected in the other. | One other set of characters, the underbarred alphabet (<apll>{A_}</apll> through <apll>{Z_}</apll>), may be pasted into a session or function editor window. There is no way to enter these characters directly from the keyboard. Depending on a User Option setting, when these characters are pasted into a session or function editor window, they are treated as themselves or are mapped to the lowercase alphabet. When used in a name, they are always equivalent to the corresponding lowercase letter, although they display as themselves. Because of this latter translation, they may be used as either a leading or subsequent character in a name. Thus the names <apll>{A_}l{P_}h{A_}</apll> and <apll>alpha</apll> display differently, but they both refer to the same object; a value assigned to one is reflected in the other. | ||
'''See also:''' Quad AV = niladic system function '''⎕AV''' - '''[[System_Function_AV|Atomic Vector]]''' page. |
Latest revision as of 21:15, 23 January 2015
Character Array and Name Storage
All character arrays and names (variable, function, and operator) are stored as one 16-bit word per character. This fixed length encoding is called UCS-2 and is a subset of a more general encoding called UTF-16. The latter is a variable length encoding using one or two 16-bit words per character.
UCS-2 represents characters in the range U+0000 through U+FFFF; UTF-16 represents characters in the range U+0000 through U+10FFFF. However, because of the way UTF-16 represents characters above U+FFFF, the range of code points for both encodings exclude U+D800 through U+DFFF (2,048 characters).
Thus, UCS-2 encodes 63,488 (=65,536 - 2,048) different characters, and UTF-16 encodes 1,112,064 (=1,114,112 - 2,048) different characters.
Alphabet for Names
The alphabet used for names consists of an initial character followed by one or more subsequent characters.
- An initial character is one of a though z, A through Z, delta (∆), or delta underbar (⍙).
- A subsequent character is a leading character, 0 through 9, overbar (¯), or underbar (_).
One other set of characters, the underbarred alphabet ( through ), may be pasted into a session or function editor window. There is no way to enter these characters directly from the keyboard. Depending on a User Option setting, when these characters are pasted into a session or function editor window, they are treated as themselves or are mapped to the lowercase alphabet. When used in a name, they are always equivalent to the corresponding lowercase letter, although they display as themselves. Because of this latter translation, they may be used as either a leading or subsequent character in a name. Thus the names lh and alpha display differently, but they both refer to the same object; a value assigned to one is reflected in the other.
See also: Quad AV = niladic system function ⎕AV - Atomic Vector page.