System Function UCS: Difference between revisions
From NARS2000
Jump to navigationJump to search
m (minor punctuation) |
No edit summary |
||
Line 1: | Line 1: | ||
<h2>Universal Character Set - ⎕UCS (System Function) | <h2>Universal Character Set - ⎕UCS (System Function):</h2> | ||
==Monadic Function== | ==Monadic Function== | ||
<p> | {{BoxStart|<apll>Z←⎕UCS R</apll>|}} | ||
<pre> | {{BoxLine|<apll>R</apll> is a Character or Integer array to convert.}} | ||
⎕UCS 'ABC' | {{BoxEnd|<apll>Z</apll> is an array of results (Integer or Character, reverse of the array storage type of <apll>R</apll>).}} | ||
65 66 67 | |||
⍴⎕UCS 'ABC' | <p>For example:</p> | ||
<apll><pre> | |||
⎕UCS 'ABC' ⍝ Type 1(CHAR string) - Locate index positions for letters A, B and C in the Universal Character Set. | |||
65 66 67 ⍝ Letters A, B and C FOUND at index positions 65, 66 and 67 in ⎕UCS. | |||
⍴⎕UCS 'ABC' ⍝ Determine the shape of the result of the call to UCS. | |||
3 | 3 | ||
⎕AV⍳'ABC' | ⎕AV⍳'ABC' ⍝ Comparable call to ⎕AV using Iota, but the same letters('ABC'). | ||
66 67 68 ⍝>> 66 67 68 << using ⎕AV versus >> 65 66 67 << using ⎕UCS - similar, but NOT identical results. | 66 67 68 ⍝ >> 66 67 68 << using ⎕AV versus >> 65 66 67 << using ⎕UCS - similar, but NOT identical results. | ||
⎕UCS 65 66 67 68 69 | ⎕UCS 65 66 67 68 69 ⍝ Type 2(INTEGERS vector) - Find characters associated with integer positions in Universal Character Set. | ||
ABCDE | ABCDE ⍝ Letters A, B, C, D and E 'ABCDE' returned, as a string. | ||
⍴⎕UCS 65 66 67 68 69 | ⍴⎕UCS 65 66 67 68 69 ⍝ Determine the shape of the result. | ||
5 | 5 | ||
⎕UCS "012345⍴⍳6789" | ⎕UCS "012345⍴⍳6789" ⍝ Another character string - with digits, rho(⍴) and iota(⍳) - also enumerated. | ||
48 49 50 51 52 53 9076 9075 54 55 56 57 | 48 49 50 51 52 53 9076 9075 54 55 56 57 ⍝ Note how Rho and Iota are much higher in the UCS than letters and digit-characters. | ||
</pre> | </pre></apll> | ||
<br> | <br> | ||
==Dyadic Function== | |||
{{BoxStart|<apll>Z←L ⎕UCS R</apll>|}} | |||
{{BoxLine|<apll>L</apll> is a character vector <apll>'UTF-8'</apll>, <apll>'UTF-16'</apll> or <apll>'UTF-32'</apll>}} | |||
{{BoxLine|<apll>R</apll> is a Character or Integer scalar or vector to convert.}} | |||
{{BoxEnd|<apll>Z</apll> is a scalar or vector of results (Integer or Character, reverse of the array storage type of <apll>R</apll>).}} | |||
<p>According to [https://en.wikipedia.org/wiki/UTF-8 Wikipedia] "UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes".</p> | |||
<p>For example:</p> | |||
<apll><pre> | |||
'UTF-8' ⎕UCS '⎕IO' | |||
226 142 149 73 79 | |||
</pre></apll> | |||
where <apll>226 142 149</apll> is the <apll>UTF-8</apll> three-byte encoding of <apll>'⎕'</apll> and <apll>73 79</apll> are the one-byte each encoding of <apll>'I'</apll> and <apll>'O'</apll>. | |||
<p>Correspondingly, a numeric right argument supplies the inverse function as in</p> | |||
<apll><pre> | |||
'UTF-8' ⎕UCS 226 142 149 73 79 | |||
⎕IO | |||
</pre></apll> | |||
<p>Because NARS2000 does not as yet support either <apll>UTF-16</apll> nor <apll>UTF-32</apll>, those left arguments to <apll>⎕UCS</apll> pass the right argument through to the result, filtering out numbers too large for the corresponding format.</p> | |||
See also: [[System_Function_AV|⎕AV]] | See also: [[System_Function_AV|⎕AV]] | ||
{{System Variables}} | {{System Variables}} |
Latest revision as of 15:29, 18 July 2019
Universal Character Set - ⎕UCS (System Function):
Monadic Function
|
||||
R is a Character or Integer array to convert. | ||||
Z is an array of results (Integer or Character, reverse of the array storage type of R). |
For example:
⎕UCS 'ABC' ⍝ Type 1(CHAR string) - Locate index positions for letters A, B and C in the Universal Character Set. 65 66 67 ⍝ Letters A, B and C FOUND at index positions 65, 66 and 67 in ⎕UCS. ⍴⎕UCS 'ABC' ⍝ Determine the shape of the result of the call to UCS. 3 ⎕AV⍳'ABC' ⍝ Comparable call to ⎕AV using Iota, but the same letters('ABC'). 66 67 68 ⍝ >> 66 67 68 << using ⎕AV versus >> 65 66 67 << using ⎕UCS - similar, but NOT identical results. ⎕UCS 65 66 67 68 69 ⍝ Type 2(INTEGERS vector) - Find characters associated with integer positions in Universal Character Set. ABCDE ⍝ Letters A, B, C, D and E 'ABCDE' returned, as a string. ⍴⎕UCS 65 66 67 68 69 ⍝ Determine the shape of the result. 5 ⎕UCS "012345⍴⍳6789" ⍝ Another character string - with digits, rho(⍴) and iota(⍳) - also enumerated. 48 49 50 51 52 53 9076 9075 54 55 56 57 ⍝ Note how Rho and Iota are much higher in the UCS than letters and digit-characters.
Dyadic Function
|
||||
L is a character vector 'UTF-8', 'UTF-16' or 'UTF-32' | ||||
R is a Character or Integer scalar or vector to convert. | ||||
Z is a scalar or vector of results (Integer or Character, reverse of the array storage type of R). |
According to Wikipedia "UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes".
For example:
'UTF-8' ⎕UCS '⎕IO' 226 142 149 73 79
where 226 142 149 is the UTF-8 three-byte encoding of '⎕' and 73 79 are the one-byte each encoding of 'I' and 'O'.
Correspondingly, a numeric right argument supplies the inverse function as in
'UTF-8' ⎕UCS 226 142 149 73 79 ⎕IO
Because NARS2000 does not as yet support either UTF-16 nor UTF-32, those left arguments to ⎕UCS pass the right argument through to the result, filtering out numbers too large for the corresponding format.
See also: ⎕AV
System Variables (A value may be assigned to these except for ⎕DM) | |||||||||
---|---|---|---|---|---|---|---|---|---|
⎕ALX | ⎕CT | ⎕DM | ⎕DT | ⎕ELX | ⎕FC | ⎕FEATURE | ⎕FPC | ⎕IC | ⎕IO |
⎕LR | ⎕LX | ⎕PP | ⎕PR | ⎕PW | ⎕RL | ⎕SA | ⎕WSID | ||
Niladic System Functions (a value cannot be assigned to these) | |||||||||
⎕A | ⎕AV | ⎕EM | ⎕ET | ⎕LC | ⎕NNAMES | ⎕NNUMS | ⎕SI | ⎕SYSID | ⎕SYSVER |
⎕T | ⎕TC | ⎕TCBEL | ⎕TCBS | ⎕TCESC | ⎕TCFF | ⎕TCHT | ⎕TCLF | ⎕TCNL | ⎕TCNUL |
⎕TS | ⎕WA | ||||||||
Monadic or dyadic system functions (a value cannot be assigned to these) | |||||||||
⎕AT | ⎕CR | ⎕DC | ⎕DFT | ⎕DL | ⎕DR | ⎕EA | ⎕EC | ⎕ERROR | ⎕ES |
⎕EX | ⎕FMT | ⎕FX | ⎕MF | ⎕NAPPEND | ⎕NC | ⎕NCREATE | ⎕NERASE | ⎕NINFO | ⎕NL |
⎕NLOCK | ⎕NREAD | ⎕NRENAME | ⎕NREPLACE | ⎕NRESIZE | ⎕NSIZE | ⎕NTIE | ⎕NUNTIE | ⎕STOP | ⎕TF |
⎕TRACE | ⎕UCS | ⎕VR | |||||||
Note that quad functions and variables (except for the ⎕A family of functions) are case insensitive |