System Function UCS

From NARS2000
Jump to navigationJump to search

Universal Character Set - ⎕UCS (System Function):

Monadic Function

Z←⎕UCS R
R is a Character or Integer array to convert.
Z is an array of results (Integer or Character, reverse of the array storage type of R).

For example:

        ⎕UCS 'ABC'        ⍝ Type 1(CHAR string) - Locate index positions for letters A, B and C in the Universal Character Set.
65 66 67                  ⍝ Letters A, B and C FOUND at index positions 65, 66 and 67 in ⎕UCS.
        ⍴⎕UCS 'ABC'       ⍝ Determine the shape of the result of the call to UCS.
3

        ⎕AV⍳'ABC'         ⍝ Comparable call to ⎕AV using Iota, but the same letters('ABC').
66 67 68                  ⍝ >> 66 67 68 << using ⎕AV versus >> 65 66 67 << using ⎕UCS - similar, but NOT identical results.

        ⎕UCS 65 66 67 68 69   ⍝ Type 2(INTEGERS vector) - Find characters associated with integer positions in Universal Character Set.
ABCDE                     ⍝ Letters A, B, C, D and E 'ABCDE' returned, as a string.
        ⍴⎕UCS 65 66 67 68 69  ⍝ Determine the shape of the result.
5

        ⎕UCS "012345⍴⍳6789"   ⍝ Another character string - with digits, rho(⍴) and iota(⍳) - also enumerated.
48 49 50 51 52 53 9076 9075 54 55 56 57      ⍝ Note how Rho and Iota are much higher in the UCS than letters and digit-characters.


Dyadic Function

Z←L ⎕UCS R
L is a character vector 'UTF-8', 'UTF-16' or 'UTF-32'
R is a Character or Integer scalar or vector to convert.
Z is a scalar or vector of results (Integer or Character, reverse of the array storage type of R).

According to Wikipedia "UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes".

For example:

      'UTF-8' ⎕UCS '⎕IO'
226 142 149 73 79

where 226 142 149 is the UTF-8 three-byte encoding of '⎕' and 73 79 are the one-byte each encoding of 'I' and 'O'.

Correspondingly, a numeric right argument supplies the inverse function as in

      'UTF-8' ⎕UCS 226 142 149 73 79
⎕IO

Because NARS2000 does not as yet support either UTF-16 nor UTF-32, those left arguments to ⎕UCS pass the right argument through to the result, filtering out numbers too large for the corresponding format.

See also: ⎕AV

System Variables (A value may be assigned to these except for ⎕DM)
ALX CT DM DT ELX FC FEATURE FPC IC IO
LR LX PP PR PW RL SA WSID
Niladic System Functions (a value cannot be assigned to these)
A AV EM ET LC NNAMES NNUMS SI SYSID SYSVER
T TC TCBEL TCBS TCESC TCFF TCHT TCLF TCNL TCNUL
TS WA
Monadic or dyadic system functions (a value cannot be assigned to these)
AT CR DC DFT DL DR EA EC ERROR ES
EX FMT FX MF NAPPEND NC NCREATE NERASE NINFO NL
NLOCK NREAD NRENAME NREPLACE NRESIZE NSIZE NTIE NUNTIE STOP TF
TRACE UCS VR
Note that quad functions and variables (except for the ⎕A family of functions) are case insensitive