System Function UCS
From NARS2000
Universal Character Set - ⎕UCS (System Function):
Monadic Function
|
||||
R is a Character or Integer array to convert. | ||||
Z is an array of results (Integer or Character, reverse of the array storage type of R). |
For example:
⎕UCS 'ABC' ⍝ Type 1(CHAR string) - Locate index positions for letters A, B and C in the Universal Character Set. 65 66 67 ⍝ Letters A, B and C FOUND at index positions 65, 66 and 67 in ⎕UCS. ⍴⎕UCS 'ABC' ⍝ Determine the shape of the result of the call to UCS. 3 ⎕AV⍳'ABC' ⍝ Comparable call to ⎕AV using Iota, but the same letters('ABC'). 66 67 68 ⍝ >> 66 67 68 << using ⎕AV versus >> 65 66 67 << using ⎕UCS - similar, but NOT identical results. ⎕UCS 65 66 67 68 69 ⍝ Type 2(INTEGERS vector) - Find characters associated with integer positions in Universal Character Set. ABCDE ⍝ Letters A, B, C, D and E 'ABCDE' returned, as a string. ⍴⎕UCS 65 66 67 68 69 ⍝ Determine the shape of the result. 5 ⎕UCS "012345⍴⍳6789" ⍝ Another character string - with digits, rho(⍴) and iota(⍳) - also enumerated. 48 49 50 51 52 53 9076 9075 54 55 56 57 ⍝ Note how Rho and Iota are much higher in the UCS than letters and digit-characters.
Dyadic Function
|
||||
L is a character vector 'UTF-8', 'UTF-16' or 'UTF-32' | ||||
R is a Character or Integer scalar or vector to convert. | ||||
Z is a scalar or vector of results (Integer or Character, reverse of the array storage type of R). |
According to Wikipedia "UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes".
For example:
'UTF-8' ⎕UCS '⎕IO' 226 142 149 73 79
where 226 142 149 is the UTF-8 three-byte encoding of '⎕' and 73 79 are the one-byte each encoding of 'I' and 'O'.
Correspondingly, a numeric right argument supplies the inverse function as in
'UTF-8' ⎕UCS 226 142 149 73 79 ⎕IO
Because NARS2000 does not as yet support either UTF-16 nor UTF-32, those left arguments to ⎕UCS pass the right argument through to the result, filtering out numbers too large for the corresponding format.
See also: ⎕AV
System Variables (A value may be assigned to these except for ⎕DM) | |||||||||
---|---|---|---|---|---|---|---|---|---|
⎕ALX | ⎕CT | ⎕DM | ⎕DT | ⎕ELX | ⎕FC | ⎕FEATURE | ⎕FPC | ⎕IC | ⎕IO |
⎕LR | ⎕LX | ⎕PP | ⎕PR | ⎕PW | ⎕RL | ⎕SA | ⎕WSID | ||
Niladic System Functions (a value cannot be assigned to these) | |||||||||
⎕A | ⎕AV | ⎕EM | ⎕ET | ⎕LC | ⎕NNAMES | ⎕NNUMS | ⎕SI | ⎕SYSID | ⎕SYSVER |
⎕T | ⎕TC | ⎕TCBEL | ⎕TCBS | ⎕TCESC | ⎕TCFF | ⎕TCHT | ⎕TCLF | ⎕TCNL | ⎕TCNUL |
⎕TS | ⎕WA | ||||||||
Monadic or dyadic system functions (a value cannot be assigned to these) | |||||||||
⎕AT | ⎕CR | ⎕DC | ⎕DFT | ⎕DL | ⎕DR | ⎕EA | ⎕EC | ⎕ERROR | ⎕ES |
⎕EX | ⎕FMT | ⎕FX | ⎕MF | ⎕NAPPEND | ⎕NC | ⎕NCREATE | ⎕NERASE | ⎕NINFO | ⎕NL |
⎕NLOCK | ⎕NREAD | ⎕NRENAME | ⎕NREPLACE | ⎕NRESIZE | ⎕NSIZE | ⎕NTIE | ⎕NUNTIE | ⎕STOP | ⎕TF |
⎕TRACE | ⎕UCS | ⎕VR | |||||||
Note that quad functions and variables (except for the ⎕A family of functions) are case insensitive |