Unicode

From NARS2000
Revision as of 18:17, 17 June 2008 by WikiSysop (talk | contribs) (New page: All character arrays and names (variable, function, and operator) are stored as one 16-bit word per character. This fixed length encoding is called UCS-2 and is a precursor to a more gene...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

All character arrays and names (variable, function, and operator) are stored as one 16-bit word per character. This fixed length encoding is called UCS-2 and is a precursor to a more general encoding called UTF-16. The latter is a variable length encoding using one or two 16-bit words per character.

UCS-2 can represent characters from U+0000 through U+FFFF; UTF-16 can represent characters from U+0000 through U+10FFFF — both encodings exclude the surrogate pair range of U+D800 through U+DFFF.

Thus UCS-2 encodes 63,488 (=65,536 - 2,048) different characters, and UTF-16 encodes 1,112,064 (=1,114,112 - 2,048) different characters. UTF-16 is needed mostly for Far Eastern languages.