System Function UCS: Difference between revisions

From NARS2000
Jump to navigationJump to search
m (minor punctuation)
No edit summary
 
Line 1: Line 1:
<h2>Universal Character Set - ⎕UCS (System Function) - available in '''monadic''' form only,<br>but with TWO different types of argument calls:</h2>
<h2>Universal Character Set - ⎕UCS (System Function):</h2>
 
==Monadic Function==
==Monadic Function==
{{BoxStart|<apll>Z←⎕UCS R</apll>
|Universal Character Set. &nbsp; &nbsp; <apll>R</apll> should be a '''character string OR set of integers''' to enumerate.}}
{{BoxLine|<apll>R</apll> is &nbsp;1) a '''character scalar or character String''' OR &nbsp; 2) '''scalar integer or set of Integers''' &nbsp; &nbsp; to <u>''enumerate''</u>.}}
{{BoxEnd|<apll>Z</apll> is a scalar or vector of results(integer or string, '''inverse of call'''.}}


<p>Examples, working with '''⎕UCS:'''</p>
{{BoxStart|<apll>Z←⎕UCS R</apll>|}}
<pre>
{{BoxLine|<apll>R</apll> is a Character or Integer array to convert.}}
         ⎕UCS 'ABC'        ⍝Type 1(CHAR string) - Locate index positions for letters A, B and C in the Universal Character Set.
{{BoxEnd|<apll>Z</apll> is an array of results (Integer or Character, reverse of the array storage type of <apll>R</apll>).}}
65 66 67                  ⍝Letters A, B and C FOUND at index positions 65, 66 and 67 in ⎕UCS.
 
         ⍴⎕UCS 'ABC'      ⍝Determine the shape of the result of the call to UCS.
<p>For example:</p>
<apll><pre>
         ⎕UCS 'ABC'        ⍝ Type 1(CHAR string) - Locate index positions for letters A, B and C in the Universal Character Set.
65 66 67                  ⍝ Letters A, B and C FOUND at index positions 65, 66 and 67 in ⎕UCS.
         ⍴⎕UCS 'ABC'      ⍝ Determine the shape of the result of the call to UCS.
3
3


         ⎕AV⍳'ABC'        ⍝Comparable call to ⎕AV using Iota, but the same letters('ABC').
         ⎕AV⍳'ABC'        ⍝ Comparable call to ⎕AV using Iota, but the same letters('ABC').
66 67 68                  ⍝>> 66 67 68 << using ⎕AV versus >> 65 66 67 << using ⎕UCS - similar, but NOT identical results.
66 67 68                  ⍝ >> 66 67 68 << using ⎕AV versus >> 65 66 67 << using ⎕UCS - similar, but NOT identical results.


         ⎕UCS 65 66 67 68 69  ⍝Type 2(INTEGERS vector) - Find characters associated with integer positions in Universal Character Set.
         ⎕UCS 65 66 67 68 69  ⍝ Type 2(INTEGERS vector) - Find characters associated with integer positions in Universal Character Set.
ABCDE                    ⍝Letters A, B, C, D and E 'ABCDE' returned, as a string.
ABCDE                    ⍝ Letters A, B, C, D and E 'ABCDE' returned, as a string.
         ⍴⎕UCS 65 66 67 68 69  ⍝Determine the shape of the result.
         ⍴⎕UCS 65 66 67 68 69  ⍝ Determine the shape of the result.
5
5


         ⎕UCS "012345⍴⍳6789"  ⍝Another character string - with digits, rho(⍴) and iota(⍳) - also enumerated.
         ⎕UCS "012345⍴⍳6789"  ⍝ Another character string - with digits, rho(⍴) and iota(⍳) - also enumerated.
48 49 50 51 52 53 9076 9075 54 55 56 57      ⍝Note how Rho and Iota are much higher in the UCS than letters and digit-characters.
48 49 50 51 52 53 9076 9075 54 55 56 57      ⍝ Note how Rho and Iota are much higher in the UCS than letters and digit-characters.
</pre>
</pre></apll>
<br>
<br>
==Dyadic Function==
{{BoxStart|<apll>Z←L ⎕UCS R</apll>|}}
{{BoxLine|<apll>L</apll> is a character vector <apll>'UTF-8'</apll>, <apll>'UTF-16'</apll> or <apll>'UTF-32'</apll>}}
{{BoxLine|<apll>R</apll> is a Character or Integer scalar or vector to convert.}}
{{BoxEnd|<apll>Z</apll> is a scalar or vector of results (Integer or Character, reverse of the array storage type of <apll>R</apll>).}}
<p>According to [https://en.wikipedia.org/wiki/UTF-8 Wikipedia] "UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes".</p>
<p>For example:</p>
<apll><pre>
      'UTF-8' ⎕UCS '⎕IO'
226 142 149 73 79
</pre></apll>
where <apll>226 142 149</apll> is the <apll>UTF-8</apll> three-byte encoding of <apll>'⎕'</apll> and <apll>73 79</apll> are the one-byte each encoding of <apll>'I'</apll> and <apll>'O'</apll>.
<p>Correspondingly, a numeric right argument supplies the inverse function as in</p>
<apll><pre>
      'UTF-8' ⎕UCS 226 142 149 73 79
⎕IO
</pre></apll>
<p>Because NARS2000 does not as yet support either <apll>UTF-16</apll> nor <apll>UTF-32</apll>, those left arguments to <apll>⎕UCS</apll> pass the right argument through to the result, filtering out numbers too large for the corresponding format.</p>
See also: [[System_Function_AV|⎕AV]]
See also: [[System_Function_AV|⎕AV]]


{{System Variables}}
{{System Variables}}

Latest revision as of 15:29, 18 July 2019

Universal Character Set - ⎕UCS (System Function):

Monadic Function

Z←⎕UCS R
R is a Character or Integer array to convert.
Z is an array of results (Integer or Character, reverse of the array storage type of R).

For example:

        ⎕UCS 'ABC'        ⍝ Type 1(CHAR string) - Locate index positions for letters A, B and C in the Universal Character Set.
65 66 67                  ⍝ Letters A, B and C FOUND at index positions 65, 66 and 67 in ⎕UCS.
        ⍴⎕UCS 'ABC'       ⍝ Determine the shape of the result of the call to UCS.
3

        ⎕AV⍳'ABC'         ⍝ Comparable call to ⎕AV using Iota, but the same letters('ABC').
66 67 68                  ⍝ >> 66 67 68 << using ⎕AV versus >> 65 66 67 << using ⎕UCS - similar, but NOT identical results.

        ⎕UCS 65 66 67 68 69   ⍝ Type 2(INTEGERS vector) - Find characters associated with integer positions in Universal Character Set.
ABCDE                     ⍝ Letters A, B, C, D and E 'ABCDE' returned, as a string.
        ⍴⎕UCS 65 66 67 68 69  ⍝ Determine the shape of the result.
5

        ⎕UCS "012345⍴⍳6789"   ⍝ Another character string - with digits, rho(⍴) and iota(⍳) - also enumerated.
48 49 50 51 52 53 9076 9075 54 55 56 57      ⍝ Note how Rho and Iota are much higher in the UCS than letters and digit-characters.


Dyadic Function

Z←L ⎕UCS R
L is a character vector 'UTF-8', 'UTF-16' or 'UTF-32'
R is a Character or Integer scalar or vector to convert.
Z is a scalar or vector of results (Integer or Character, reverse of the array storage type of R).

According to Wikipedia "UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes".

For example:

      'UTF-8' ⎕UCS '⎕IO'
226 142 149 73 79

where 226 142 149 is the UTF-8 three-byte encoding of '⎕' and 73 79 are the one-byte each encoding of 'I' and 'O'.

Correspondingly, a numeric right argument supplies the inverse function as in

      'UTF-8' ⎕UCS 226 142 149 73 79
⎕IO

Because NARS2000 does not as yet support either UTF-16 nor UTF-32, those left arguments to ⎕UCS pass the right argument through to the result, filtering out numbers too large for the corresponding format.

See also: ⎕AV

System Variables (A value may be assigned to these except for ⎕DM)
ALX CT DM DT ELX FC FEATURE FPC IC IO
LR LX PP PR PW RL SA WSID
Niladic System Functions (a value cannot be assigned to these)
A AV EM ET LC NNAMES NNUMS SI SYSID SYSVER
T TC TCBEL TCBS TCESC TCFF TCHT TCLF TCNL TCNUL
TS WA
Monadic or dyadic system functions (a value cannot be assigned to these)
AT CR DC DFT DL DR EA EC ERROR ES
EX FMT FX MF NAPPEND NC NCREATE NERASE NINFO NL
NLOCK NREAD NRENAME NREPLACE NRESIZE NSIZE NTIE NUNTIE STOP TF
TRACE UCS VR
Note that quad functions and variables (except for the ⎕A family of functions) are case insensitive