System Function UCS: Difference between revisions

Latest revision as of 16:29, 18 July 2019

Universal Character Set - ⎕UCS (System Function):

Monadic Function

Z←⎕UCS R

R is a Character or Integer array to convert.

Z is an array of results (Integer or Character, reverse of the array storage type of R).

For example:

        ⎕UCS 'ABC'        ⍝ Type 1(CHAR string) - Locate index positions for letters A, B and C in the Universal Character Set.
65 66 67                  ⍝ Letters A, B and C FOUND at index positions 65, 66 and 67 in ⎕UCS.
        ⍴⎕UCS 'ABC'       ⍝ Determine the shape of the result of the call to UCS.
3

        ⎕AV⍳'ABC'         ⍝ Comparable call to ⎕AV using Iota, but the same letters('ABC').
66 67 68                  ⍝ >> 66 67 68 << using ⎕AV versus >> 65 66 67 << using ⎕UCS - similar, but NOT identical results.

        ⎕UCS 65 66 67 68 69   ⍝ Type 2(INTEGERS vector) - Find characters associated with integer positions in Universal Character Set.
ABCDE                     ⍝ Letters A, B, C, D and E 'ABCDE' returned, as a string.
        ⍴⎕UCS 65 66 67 68 69  ⍝ Determine the shape of the result.
5

        ⎕UCS "012345⍴⍳6789"   ⍝ Another character string - with digits, rho(⍴) and iota(⍳) - also enumerated.
48 49 50 51 52 53 9076 9075 54 55 56 57      ⍝ Note how Rho and Iota are much higher in the UCS than letters and digit-characters.

Dyadic Function

Z←L ⎕UCS R

L is a character vector 'UTF-8', 'UTF-16' or 'UTF-32'

R is a Character or Integer scalar or vector to convert.

Z is a scalar or vector of results (Integer or Character, reverse of the array storage type of R).

According to Wikipedia "UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes".

For example:

      'UTF-8' ⎕UCS '⎕IO'
226 142 149 73 79

where 226 142 149 is the UTF-8 three-byte encoding of '⎕' and 73 79 are the one-byte each encoding of 'I' and 'O'.

Correspondingly, a numeric right argument supplies the inverse function as in

      'UTF-8' ⎕UCS 226 142 149 73 79
⎕IO

Because NARS2000 does not as yet support either UTF-16 nor UTF-32, those left arguments to ⎕UCS pass the right argument through to the result, filtering out numbers too large for the corresponding format.

See also: ⎕AV

System Variables (A value may be assigned to these except for ⎕DM)
⎕ALX	⎕CT	⎕DM	⎕DT	⎕ELX	⎕FC	⎕FEATURE	⎕FPC	⎕IC	⎕IO
⎕LR	⎕LX	⎕PP	⎕PR	⎕PW	⎕RL	⎕SA	⎕WSID
Niladic System Functions (a value cannot be assigned to these)
⎕A	⎕AV	⎕EM	⎕ET	⎕LC	⎕NNAMES	⎕NNUMS	⎕SI	⎕SYSID	⎕SYSVER
⎕T	⎕TC	⎕TCBEL	⎕TCBS	⎕TCESC	⎕TCFF	⎕TCHT	⎕TCLF	⎕TCNL	⎕TCNUL
⎕TS	⎕WA
Monadic or dyadic system functions (a value cannot be assigned to these)
⎕AT	⎕CR	⎕DC	⎕DFT	⎕DL	⎕DR	⎕EA	⎕EC	⎕ERROR	⎕ES
⎕EX	⎕FMT	⎕FX	⎕MF	⎕NAPPEND	⎕NC	⎕NCREATE	⎕NERASE	⎕NINFO	⎕NL
⎕NLOCK	⎕NREAD	⎕NRENAME	⎕NREPLACE	⎕NRESIZE	⎕NSIZE	⎕NTIE	⎕NUNTIE	⎕STOP	⎕TF
⎕TRACE	⎕UCS	⎕VR
Note that quad functions and variables (except for the ⎕A family of functions) are case insensitive

@@ Line 1: / Line 1: @@
-<h2>Universal Character Set - ⎕UCS (System Function) - available in '''monadic''' form only,<br>but with TWO different types of argument calls:</h2>
+<h2>Universal Character Set - ⎕UCS (System Function):</h2>
 ==Monadic Function==
-{{BoxStart|<apll>Z←⎕UCS R</apll>
-|Universal Character Set. &nbsp; &nbsp; <apll>R</apll> should be a '''character string OR set of integers''' to enumerate.}}
-{{BoxLine|<apll>R</apll> is &nbsp;1) a '''character scalar or character String''' OR &nbsp; 2) '''scalar integer or set of Integers''' &nbsp; &nbsp; to <u>''enumerate''</u>.}}
-{{BoxEnd|<apll>Z</apll> is a scalar or vector of results(integer or string, '''inverse of call'''.}}
-<p>Examples, working with '''⎕UCS:'''</p>
+{{BoxStart|<apll>Z←⎕UCS R</apll>|}}
-<pre>
+{{BoxLine|<apll>R</apll> is a Character or Integer array to convert.}}
-         ⎕UCS 'ABC'        ⍝Type 1(CHAR string) - Locate index positions for letters A, B and C in the Universal Character Set.
+{{BoxEnd|<apll>Z</apll> is an array of results (Integer or Character, reverse of the array storage type of <apll>R</apll>).}}
-66 67                  ⍝Letters A, B and C FOUND at index positions 65, 66 and 67 in ⎕UCS.
-         ⍴⎕UCS 'ABC'       ⍝Determine the shape of the result of the call to UCS.
+<p>For example:</p>
+<apll><pre>
+         ⎕UCS 'ABC'        ⍝ Type 1(CHAR string) - Locate index positions for letters A, B and C in the Universal Character Set.
+66 67                  ⍝ Letters A, B and C FOUND at index positions 65, 66 and 67 in ⎕UCS.
+         ⍴⎕UCS 'ABC'       ⍝ Determine the shape of the result of the call to UCS.
-         ⎕AV⍳'ABC'         ⍝Comparable call to ⎕AV using Iota, but the same letters('ABC').
+         ⎕AV⍳'ABC'         ⍝ Comparable call to ⎕AV using Iota, but the same letters('ABC').
-67 68                  ⍝>> 66 67 68 << using ⎕AV versus >> 65 66 67 << using ⎕UCS - similar, but NOT identical results.
+67 68                  ⍝ >> 66 67 68 << using ⎕AV versus >> 65 66 67 << using ⎕UCS - similar, but NOT identical results.
-         ⎕UCS 65 66 67 68 69   ⍝Type 2(INTEGERS vector) - Find characters associated with integer positions in Universal Character Set.
+         ⎕UCS 65 66 67 68 69   ⍝ Type 2(INTEGERS vector) - Find characters associated with integer positions in Universal Character Set.
-ABCDE                     ⍝Letters A, B, C, D and E 'ABCDE' returned, as a string.
+ABCDE                     ⍝ Letters A, B, C, D and E 'ABCDE' returned, as a string.
-         ⍴⎕UCS 65 66 67 68 69  ⍝Determine the shape of the result.
+         ⍴⎕UCS 65 66 67 68 69  ⍝ Determine the shape of the result.
-         ⎕UCS "012345⍴⍳6789"   ⍝Another character string - with digits, rho(⍴) and iota(⍳) - also enumerated.
+         ⎕UCS "012345⍴⍳6789"   ⍝ Another character string - with digits, rho(⍴) and iota(⍳) - also enumerated.
-49 50 51 52 53 9076 9075 54 55 56 57      ⍝Note how Rho and Iota are much higher in the UCS than letters and digit-characters.
+49 50 51 52 53 9076 9075 54 55 56 57      ⍝ Note how Rho and Iota are much higher in the UCS than letters and digit-characters.
-</pre>
+</pre></apll>
 <br>
+==Dyadic Function==
+{{BoxStart|<apll>Z←L ⎕UCS R</apll>|}}
+{{BoxLine|<apll>L</apll> is a character vector <apll>'UTF-8'</apll>, <apll>'UTF-16'</apll> or <apll>'UTF-32'</apll>}}
+{{BoxLine|<apll>R</apll> is a Character or Integer scalar or vector to convert.}}
+{{BoxEnd|<apll>Z</apll> is a scalar or vector of results (Integer or Character, reverse of the array storage type of <apll>R</apll>).}}
+<p>According to [https://en.wikipedia.org/wiki/UTF-8 Wikipedia] "UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes".</p>
+<p>For example:</p>
+<apll><pre>
+      'UTF-8' ⎕UCS '⎕IO'
+142 149 73 79
+</pre></apll>
+where <apll>226 142 149</apll> is the <apll>UTF-8</apll> three-byte encoding of <apll>'⎕'</apll> and <apll>73 79</apll> are the one-byte each encoding of <apll>'I'</apll> and <apll>'O'</apll>.
+<p>Correspondingly, a numeric right argument supplies the inverse function as in</p>
+<apll><pre>
+      'UTF-8' ⎕UCS 226 142 149 73 79
+⎕IO
+</pre></apll>
+<p>Because NARS2000 does not as yet support either <apll>UTF-16</apll> nor <apll>UTF-32</apll>, those left arguments to <apll>⎕UCS</apll> pass the right argument through to the result, filtering out numbers too large for the corresponding format.</p>
 See also: [[System_Function_AV|⎕AV]]
 {{System Variables}}

System Function UCS: Difference between revisions

Latest revision as of 16:29, 18 July 2019

Universal Character Set - ⎕UCS (System Function):

Monadic Function

Dyadic Function

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools