Talk:Unicode: Difference between revisions
From NARS2000
Jump to navigationJump to search
(New page: It is true that "UCS-2 represents characters from U+0000 through U+FFFF; UTF-16 represents characters from U+0000 through U+10FFFF" but only UCS-2 excludes the ''surrogate pair range'' of ...) |
No edit summary |
||
Line 1: | Line 1: | ||
== Character Array and Name Storage == | |||
It is true that "UCS-2 represents characters from U+0000 through U+FFFF; UTF-16 represents characters from U+0000 through U+10FFFF" but only UCS-2 excludes the ''surrogate pair range'' of U+D800 through U+DFFF (2,048 characters). UTF-16 uses precisely these surrogate pairs to represent the characters outside the ''Basic Multilingual Plane (BMP)''. | It is true that "UCS-2 represents characters from U+0000 through U+FFFF; UTF-16 represents characters from U+0000 through U+10FFFF" but only UCS-2 excludes the ''surrogate pair range'' of U+D800 through U+DFFF (2,048 characters). UTF-16 uses precisely these surrogate pairs to represent the characters outside the ''Basic Multilingual Plane (BMP)''. | ||
It's a bit reductive to say that "UTF-16 is needed mostly for Far Eastern languages" in the sense that the BMP contains the basic characters for Far Eastern languages. The other planes are concerned with anciant scripts (egyptian hyeroglyphs, phoenician, cuneiform but also western and byzantine music and ''extensions'' of chinese characters). | It's a bit reductive to say that "UTF-16 is needed mostly for Far Eastern languages" in the sense that the BMP contains the basic characters for Far Eastern languages. The other planes are concerned with anciant scripts (egyptian hyeroglyphs, phoenician, cuneiform but also western and byzantine music and ''extensions'' of chinese characters). |
Revision as of 14:05, 20 March 2009
Character Array and Name Storage
It is true that "UCS-2 represents characters from U+0000 through U+FFFF; UTF-16 represents characters from U+0000 through U+10FFFF" but only UCS-2 excludes the surrogate pair range of U+D800 through U+DFFF (2,048 characters). UTF-16 uses precisely these surrogate pairs to represent the characters outside the Basic Multilingual Plane (BMP).
It's a bit reductive to say that "UTF-16 is needed mostly for Far Eastern languages" in the sense that the BMP contains the basic characters for Far Eastern languages. The other planes are concerned with anciant scripts (egyptian hyeroglyphs, phoenician, cuneiform but also western and byzantine music and extensions of chinese characters).