Unicode: Difference between revisions

From NARS2000
Jump to navigationJump to search
(New page: All character arrays and names (variable, function, and operator) are stored as one 16-bit word per character. This fixed length encoding is called UCS-2 and is a precursor to a more gene...)
 
No edit summary
Line 1: Line 1:
All character arrays and names (variable, function, and operator) are stored as one 16-bit word per character.  This fixed length encoding is called UCS-2 and is a precursor to a more general encoding called [http://en.wikipedia.org/wiki/UTF-16 UTF-16].  The latter is a variable length encoding using one or two 16-bit words per character.
==Character Array and Name Storage==
All character arrays and names (variable, function, and operator) are stored as one 16-bit word per character.  This fixed length encoding is called UCS-2 and is a subset of a more general encoding called [http://en.wikipedia.org/wiki/UTF-16 UTF-16].  The latter is a variable length encoding using one or two 16-bit words per character.


UCS-2 can represent characters from U+0000 through U+FFFF; UTF-16 can represent characters from U+0000 through U+10FFFF — both encodings exclude the surrogate pair range of U+D800 through U+DFFF.
UCS-2 represents characters from U+0000 through U+FFFF; UTF-16 represents characters from U+0000 through U+10FFFF — both encodings exclude the surrogate pair range of U+D800 through U+DFFF.


Thus UCS-2 encodes 63,488 (=65,536 - 2,048) different characters, and UTF-16 encodes 1,112,064 (=1,114,112 - 2,048) different characters.  UTF-16 is needed mostly for Far Eastern languages.
Thus, UCS-2 encodes 63,488 (=65,536 - 2,048) different characters, and UTF-16 encodes 1,112,064 (=1,114,112 - 2,048) different characters.  UTF-16 is needed mostly for Far Eastern languages.
 
==Alphabet for Names==
The alphabet used for names consists of an initial character followed by one or more subsequent characters.
 
* A leading character is one of <apll>a</apll> though <apll>z</apll>, <apll>A</apll> through <apll>Z</apll>, delta (<apll>∆</apll>), or delta underbar (<apll>⍙</apll>).
* A subsequent character is a leading character, <apll>0</apll> through <apll>9</apll>, overbar (<apll>{overbar}</apll>), or underbar (<apll>_</apll>).
 
One other set of characters, the underbarred alphabet (<apll>{A_}</apll> through <apll>{Z_}</apll>), may be pasted into a session or function editor window. There is no way to enter these characters directly from the keyboard.  Depending on a User Option setting, when these characters are pasted into a session or function editor window, they are treated as themselves or are mapped to the lowercase alphabet.  When used in a name, they are always equivalent to the corresponding lowercase letter, although they display as themselves.  Because of this latter translation, they may be used as either a leading or subsequent character in a name.  Thus the names <apll>{A_}l{P_}h{A_}</apll> and <apll>alpha</apll> display differently, but they both refer to the same object; a value assigned to one is reflected in the other.

Revision as of 19:04, 17 June 2008

Character Array and Name Storage

All character arrays and names (variable, function, and operator) are stored as one 16-bit word per character. This fixed length encoding is called UCS-2 and is a subset of a more general encoding called UTF-16. The latter is a variable length encoding using one or two 16-bit words per character.

UCS-2 represents characters from U+0000 through U+FFFF; UTF-16 represents characters from U+0000 through U+10FFFF — both encodings exclude the surrogate pair range of U+D800 through U+DFFF.

Thus, UCS-2 encodes 63,488 (=65,536 - 2,048) different characters, and UTF-16 encodes 1,112,064 (=1,114,112 - 2,048) different characters. UTF-16 is needed mostly for Far Eastern languages.

Alphabet for Names

The alphabet used for names consists of an initial character followed by one or more subsequent characters.

  • A leading character is one of a though z, A through Z, delta (), or delta underbar ().
  • A subsequent character is a leading character, 0 through 9, overbar (¯), or underbar (_).

One other set of characters, the underbarred alphabet ( through ), may be pasted into a session or function editor window. There is no way to enter these characters directly from the keyboard. Depending on a User Option setting, when these characters are pasted into a session or function editor window, they are treated as themselves or are mapped to the lowercase alphabet. When used in a name, they are always equivalent to the corresponding lowercase letter, although they display as themselves. Because of this latter translation, they may be used as either a leading or subsequent character in a name. Thus the names lh and alpha display differently, but they both refer to the same object; a value assigned to one is reflected in the other.