Talk:Unicode: Difference between revisions

From NARS2000
Jump to navigationJump to search
No edit summary
 
Line 3: Line 3:


It is true that "UCS-2 represents characters from U+0000 through U+FFFF; UTF-16 represents characters from U+0000 through U+10FFFF" but only UCS-2 excludes the ''surrogate pair range'' of U+D800 through U+DFFF (2,048 characters). UTF-16 uses precisely these surrogate pairs to represent the characters outside the ''Basic Multilingual Plane (BMP)''.
It is true that "UCS-2 represents characters from U+0000 through U+FFFF; UTF-16 represents characters from U+0000 through U+10FFFF" but only UCS-2 excludes the ''surrogate pair range'' of U+D800 through U+DFFF (2,048 characters). UTF-16 uses precisely these surrogate pairs to represent the characters outside the ''Basic Multilingual Plane (BMP)''.
[Bob Smith:  I was trying to convey the idea that because of the need for surrogate pairs in UTF-16, the Unicode code points U+0000 through U+10FFFF have a hole at U+D800 through U+DFFF.]


It's a bit reductive to say that "UTF-16 is needed mostly for Far Eastern languages" in the sense that the BMP contains the basic characters for Far Eastern languages. The other planes are concerned with anciant scripts (egyptian hyeroglyphs, phoenician, cuneiform but also western and byzantine music and ''extensions'' of chinese characters).
It's a bit reductive to say that "UTF-16 is needed mostly for Far Eastern languages" in the sense that the BMP contains the basic characters for Far Eastern languages. The other planes are concerned with anciant scripts (egyptian hyeroglyphs, phoenician, cuneiform but also western and byzantine music and ''extensions'' of chinese characters).
[Bob Smith:  Quite right -- that phrase is unnecessary and has been removed.]


== Alphabet for Names ==
== Alphabet for Names ==

Latest revision as of 21:22, 2 February 2010

Character Array and Name Storage

It is true that "UCS-2 represents characters from U+0000 through U+FFFF; UTF-16 represents characters from U+0000 through U+10FFFF" but only UCS-2 excludes the surrogate pair range of U+D800 through U+DFFF (2,048 characters). UTF-16 uses precisely these surrogate pairs to represent the characters outside the Basic Multilingual Plane (BMP).

[Bob Smith: I was trying to convey the idea that because of the need for surrogate pairs in UTF-16, the Unicode code points U+0000 through U+10FFFF have a hole at U+D800 through U+DFFF.]

It's a bit reductive to say that "UTF-16 is needed mostly for Far Eastern languages" in the sense that the BMP contains the basic characters for Far Eastern languages. The other planes are concerned with anciant scripts (egyptian hyeroglyphs, phoenician, cuneiform but also western and byzantine music and extensions of chinese characters).

[Bob Smith: Quite right -- that phrase is unnecessary and has been removed.]

Alphabet for Names

Historically, the first implementation of APL used a Selectric typeball as input device. There was only one form of alphabet made of the uppercase letters [A-Z]. I do'nt remember exactly who had the idea of underlining the letters (perhaps Adin Falkoff) but the intention was clearly to have an emphatic set of characters in opposite of the standard one.

Then, in my humble opinion, the mapping is non-emphatic/lowercase and emphatic/uppercase because uppercase letters seems stronger than lowercase ones.