System Function UCS
From NARS2000
Jump to navigationJump to search
Universal Character Set - ⎕UCS (System Function):
Monadic Function
|
||||
| R is a Character or Integer array to convert. | ||||
| Z is an array of results (Integer or Character, reverse of the array storage type of R). |
For example:
⎕UCS 'ABC' ⍝ Type 1(CHAR string) - Locate index positions for letters A, B and C in the Universal Character Set.
65 66 67 ⍝ Letters A, B and C FOUND at index positions 65, 66 and 67 in ⎕UCS.
⍴⎕UCS 'ABC' ⍝ Determine the shape of the result of the call to UCS.
3
⎕AV⍳'ABC' ⍝ Comparable call to ⎕AV using Iota, but the same letters('ABC').
66 67 68 ⍝ >> 66 67 68 << using ⎕AV versus >> 65 66 67 << using ⎕UCS - similar, but NOT identical results.
⎕UCS 65 66 67 68 69 ⍝ Type 2(INTEGERS vector) - Find characters associated with integer positions in Universal Character Set.
ABCDE ⍝ Letters A, B, C, D and E 'ABCDE' returned, as a string.
⍴⎕UCS 65 66 67 68 69 ⍝ Determine the shape of the result.
5
⎕UCS "012345⍴⍳6789" ⍝ Another character string - with digits, rho(⍴) and iota(⍳) - also enumerated.
48 49 50 51 52 53 9076 9075 54 55 56 57 ⍝ Note how Rho and Iota are much higher in the UCS than letters and digit-characters.
Dyadic Function
|
||||
| L is a character vector 'UTF-8', 'UTF-16' or 'UTF-32' | ||||
| R is a Character or Integer scalar or vector to convert. | ||||
| Z is a scalar or vector of results (Integer or Character, reverse of the array storage type of R). |
According to Wikipedia "UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes".
For example:
'UTF-8' ⎕UCS '⎕IO' 226 142 149 73 79
where 226 142 149 is the UTF-8 three-byte encoding of '⎕' and 73 79 are the one-byte each encoding of 'I' and 'O'.
Correspondingly, a numeric right argument supplies the inverse function as in
'UTF-8' ⎕UCS 226 142 149 73 79 ⎕IO
Because NARS2000 does not as yet support either UTF-16 nor UTF-32, those left arguments to ⎕UCS pass the right argument through to the result, filtering out numbers too large for the corresponding format.
See also: ⎕AV
| System Variables (A value may be assigned to these except for ⎕DM) | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| ⎕ALX | ⎕CT | ⎕DM | ⎕DT | ⎕ELX | ⎕FC | ⎕FEATURE | ⎕FPC | ⎕IC | ⎕IO |
| ⎕LR | ⎕LX | ⎕PP | ⎕PR | ⎕PW | ⎕RL | ⎕SA | ⎕WSID | ||
| Niladic System Functions (a value cannot be assigned to these) | |||||||||
| ⎕A | ⎕AV | ⎕EM | ⎕ET | ⎕LC | ⎕NNAMES | ⎕NNUMS | ⎕SI | ⎕SYSID | ⎕SYSVER |
| ⎕T | ⎕TC | ⎕TCBEL | ⎕TCBS | ⎕TCESC | ⎕TCFF | ⎕TCHT | ⎕TCLF | ⎕TCNL | ⎕TCNUL |
| ⎕TS | ⎕WA | ||||||||
| Monadic or dyadic system functions (a value cannot be assigned to these) | |||||||||
| ⎕AT | ⎕CR | ⎕DC | ⎕DFT | ⎕DL | ⎕DR | ⎕EA | ⎕EC | ⎕ERROR | ⎕ES |
| ⎕EX | ⎕FMT | ⎕FX | ⎕MF | ⎕NAPPEND | ⎕NC | ⎕NCREATE | ⎕NERASE | ⎕NINFO | ⎕NL |
| ⎕NLOCK | ⎕NREAD | ⎕NRENAME | ⎕NREPLACE | ⎕NRESIZE | ⎕NSIZE | ⎕NTIE | ⎕NUNTIE | ⎕STOP | ⎕TF |
| ⎕TRACE | ⎕UCS | ⎕VR | |||||||
| Note that quad functions and variables (except for the ⎕A family of functions) are case insensitive | |||||||||