Datatypes and Storage

From NARS2000
Revision as of 17:55, 13 October 2019 by Paul Robinson (talk | contribs) (→‎Type Demotion)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Datatypes

Immediate Data

If data meets specific criteria, it is stored in the Symbol Table instead of global memory. That is if it is a simple (Character, Boolean, Integer, Floating Point) scalar, it is represented in the Symbol Table as follows according to these structures which may be found in symtab.h:

// Symbol table flags
typedef struct tagSTFLAGS
{
    UINT Imm:1,             // 00000001:  The data in .stData is Immediate simple numeric or character scalar
         ImmType:5,         // 0000003E:  ...                    Immediate Boolean, Integer, Character, or Float (see IMM_TYPES)
         Inuse:1,           // 00000040:  Inuse entry
         Value:1,           // 00000080:  Entry has a value
         ObjName:3,         // 00000700:  The data in .stData is NULL if .stNameType is NAMETYPE_UNK; value, address, or HGLOBAL otherwise
                            //            (see OBJ_NAMES)
         stNameType:4,      // 00007800:  The data in .stdata is value (if .Imm), address (if .FcnDir), or HGLOBAL (otherwise)
                            //            (see NAME_TYPES)
         SysVarValid:5,     // 000F8000:  Index to validation routine for System Vars (see SYS_VARS)
         UsrDfn:1,          // 00100000:  User-defined function/operator/hyperator
         DfnLabel:1,        // 00200000:  User-defined function/operator/hyperator label        (valid only if .Value is set)
         DfnSysLabel:1,     // 00400000:  User-defined function/operator/hyperator system label (valid only if .Value is set)
         DfnAxis:1,         // 00800000:  User-defined function/operator/hyperator accepts axis value
         FcnDir:1,          // 01000000:  Direct function/operator/hyperator               (stNameFcn is valid)
         StdSysName:1,      // 02000000:  Is a standard System Name
         bIsAlpha:1,        // 04000000:  Is Alpha
         bIsOmega:1,        // 08000000:  Is Omega
         :4;                // F0000000:  Available bits
} STFLAGS, *LPSTFLAGS;

// N.B.:  Whenever changing the above struct (STFLAGS),
//   be sure to make a corresponding change to
//   <astFlagNames> in <dispdbg.c>.

// .Inuse and .PrinHash are valid for all entries.
// .Inuse   = 0 implies that all but .PrinHash are zero.
// .Imm     implies one and only one of the IMMTYPE_***s
// .Imm     = 1 implies that one and only one of aplBoolean, aplInteger, aplChar, or aplFloat is valid.
// .Imm     = 0 implies that stGlbData is valid.
// .Value   is valid for NAMETYPE_VAR only, however .stNameType EQ NAMETYPE_VAR
//          should never be without a value.
// .UsrDfn  is set when the function is user-defined.
// .FcnDir  may be set for any function/operator/hyperator; it is a
//          direct pointer to the code.
// htGlbName in HSHENTRY is set when .Imm and .FcnDir are clear.

// Immediate data or a handle to global data
typedef union tagSYMTAB_DATA
{
    APLBOOL    stBoolean;       // 00:  A number (Boolean)
    APLINT     stInteger;       // 00:  A number (Integer)
    APLFLOAT   stFloat;         // 00:  A floating point number
    APLCHAR    stChar;          // 00:  A character
    HGLOBAL    stGlbData;       // 00:  Handle of the entry's data
    LPVOID     stVoid;          // 00:  An abritrary ptr
    LPPRIMFNS  stNameFcn;       // 00:  Ptr to a named function
    APLLONGEST stLongest;       // 00:  Longest datatype (so we can copy the entire data)
                                // 08:  Length
} SYMTAB_DATA, *LPSYMTAB_DATA;

#define SYM_HEADER_SIGNATURE    'EMYS'

// Symbol table entry
typedef struct tagSYMENTRY
{
    STFLAGS     stFlags;        // 00:  Flags
    SYMTAB_DATA stData;         // 04:  For immediates, the data value;
                                //        for others, the HGLOBAL (8 bytes)
    LPHSHENTRY  stHshEntry;     // 0C:  Ptr to the matching HSHENTRY
    struct tagSYMENTRY
               *stPrvEntry,     // 10:  Ptr to previous (shadowed) STE (NULL = none)
               *stSymLink;      // 14:  Ptr to next entry in linked list of
                                //        similarly grouped entries (NULL = none)
    UINT        stSILevel;      // 18:  State Indicator Level for this STE
    HEADER_SIGNATURE Sig;       // 1C:  STE header signature
                                // 20:  Length
} SYMENTRY, *LPSYMENTRY;

Global Data

Each global datatype type is stored in a global variable allocated by MyGlobalAlloc (GPTR | GHND, # Bytes) which returns a global memory handle (or NULL if an error occurs). The # Bytes is calculated by the function CalcArraySize (ARRAY_TYPE, APLNELM, APLRANK). The ARRAY_TYPE enum is defined in datatype.h as follows:

// Array types -- used to identify array storage type in memory
typedef enum tagARRAY_TYPES
{
 ARRAY_BOOL = 0 ,                       // 00:  Boolean
 ARRAY_INT      ,                       // 01:  Integer
 ARRAY_FLOAT    ,                       // 02:  Floating point
 ARRAY_CHAR     ,                       // 03:  Character
 ARRAY_HETERO   ,                       // 04:  Simple heterogeneous (mixed numeric and character scalars)
 ARRAY_NESTED   ,                       // 05:  Nested
 ARRAY_LIST     ,                       // 06:  List
 ARRAY_APA      ,                       // 07:  Arithmetic Progression Array
 ARRAY_RAT      ,                       // 08:  Multiprecision Rational Number
 ARRAY_VFP      ,                       // 09:  Variable-precision Float
 ARRAY_HC2I     ,                       // 0A:  Complex    INT coefficients
 ARRAY_HC2F     ,                       // 0B:  ...        FLT ...
 ARRAY_HC2R     ,                       // 0C:  ...        RAT ...
 ARRAY_HC2V     ,                       // 0D:  ...        VFP ...
 ARRAY_HC4I     ,                       // 0E:  Quaternion INT coefficients
 ARRAY_HC4F     ,                       // 0F:  ...        FLT ...
 ARRAY_HC4R     ,                       // 10:  ...        RAT ...
 ARRAY_HC4V     ,                       // 11:  ...        VFP ...
 ARRAY_HC8I     ,                       // 12:  Octonion   INT coefficients
 ARRAY_HC8F     ,                       // 13:  ...        FLT ...
 ARRAY_HC8R     ,                       // 14:  ...        RAT ...
 ARRAY_HC8V     ,                       // 15:  ...        VFP ...

 ARRAY_LENGTH   ,                       // 16:  # elements in this enum
                                        //      *MUST* be the last non-error entry
                                        // 17-1F:  Available entries (5 bits)
 ARRAY_INIT     = ARRAY_LENGTH  ,
 ARRAY_ERROR    = (APLSTYPE) -1 ,
 ARRAY_NONCE    = (APLSTYPE) -2 ,
 ARRAY_REALONLY = (APLSTYPE) -3 ,

 ARRAY_HC1I  =   ARRAY_INT   ,          // To simplify common macros
 ARRAY_HC1F  =   ARRAY_FLOAT ,          // ...
 ARRAY_HC1R  =   ARRAY_RAT   ,          // ...
 ARRAY_HC1V  =   ARRAY_VFP   ,          // ...
} ARRAY_TYPES;

The APLNELM typedef is defined in types.h as follows (where ULONGLONG is defined as an unsigned 64-bit integer):

typedef ULONGLONG   APLNELM;            // The type of the # elements in an array

The APLRANK typedef is defined in types.h as follows:

typedef ULONGLONG   APLRANK;            // The type of the rank element in an array

Headers

Each global array is preceded in memory by a header and is defined in datatype.h as follows:

typedef struct tagHEADER_SIGNATURE
{
    UINT             nature;            // 00:  Array header signature (common to all types of arrays)
                                        // 04:  Length
} HEADER_SIGNATURE, *LPHEADER_SIGNATURE;


// Variable array header
#define VARARRAY_HEADER_SIGNATURE   'SRAV'

typedef struct tagVARARRAY_HEADER
{
    HEADER_SIGNATURE Sig;               // 00:  Array header signature
    UINT             ArrType:5,         // 04:  0000001F:  The type of the array (see ARRAY_TYPES)
                     PermNdx:5,         //      000003E0:  Permanent array index (e.g., PERMNDX_ZILDE for ⍬)
                     SysVar:1,          //      00000400:  Izit for a Sysvar (***DEBUG*** only)?
                     PV0:1,             //      00000800:  Permutation Vector in origin-0
                     PV1:1,             //      00001000:  ...                          1
                     bSelSpec:1,        //      00002000:  Select Specification array
                     All2s:1,           //      00004000:  Values are all 2s
#ifdef DEBUG
                     bMFOvar:1,         //      00008000:  Magic Function/Operator/Hyperator var -- do not display
                     :16;               //      FFFF0000:  Available bits
#else
                     :17;               //      FFFF8000:  Available bits
#endif
    UINT             RefCnt;            // 08:  Reference count
    APLNELM          NELM;              // 0C:  # elements in the array (8 bytes)
    APLRANK          Rank;              // 14:  The rank of the array (8 bytes)
                                        //      followed by the dimensions
                                        // 1C:  Length
} VARARRAY_HEADER, *LPVARARRAY_HEADER;

Characters

There is only one character type (ARRAY_CHAR) and is stored as 16-bit WORDs in UCS-2 format. This format is a subset of UTF-16LE in that it does not attempt to handle characters beyond the BMP (Basic Multilingual Plane), that is 16 bits.

Numbers

There are numerous numeric datatypes from Boolean to Octonions. All of the numeric datatypes in this section use the common header above. The data portion of the array immediately follows the header.

Booleans are stored in the usual one element per bit in Little-Endian format.

The rest of the numeric datatypes in this section can be described by the "outer product" of its dimension (1, 2, 4, 8) and its Basic Type (8-byte Integer, 8-byte Floating Point, __mpq_struct (24- or 32-byte) Multiple-precision Integer/Rational, and __mpfr_struct (32- or 40-byte) Multiple-precision Floating Point). Because a Multiple-precision number contains a pointer to its data, its byte count depends upon the size of a pointer (32- or 64-bit). The dimensions (1, 2, 4, 8) correspond to the Real, Complex, Quaternion, and Octonion numbers. A scalar number in a specific dimension has as many coefficients (the Basic Types) as the dimension. The Multiple-precision types __mpq_struct and __mpfr_struct are defined in mpir.h and mpfr.h, respectively. The struct for __mpq_struct is defined as two __mpz_structs, one for the numerator and one for the denominator where __mpz_struct represents a Multiple-precision Integer and is defined in mpir.h.

For example, the data portion of

  • A Real Integer array has one 64-bit integer for each element.
  • A Complex Multiple-precision Integer/Rational array has two __mpq_struct Basic Types per element.
  • A Quaternion Multiple-precision Floating Point array has four __mpfr_struct Basic Types per element.

Nested Arrays

Nested Arrays (ARRAY_NESTED) use the common header as above. The data portion of a Nested Array consists of a series of pointers (either 32- or 64-bit depending upon the width of the ABI (Application Binary Interface) of the program as 32- or 64-bit). As each pointer is on at least a 32-bit boundary, the low-order bit (normally 0) is used to distinguish STEs (Symbol Table Entries) from Global pointers. In an STE pointer the low-order bit is 0 and in Global pointers it is 1. A STE pointer is then an index into the current Symbol Table and a Global pointer (with the low-order bit cleared) is a global memory handle which may be locked and unlocked using MyGlobalLock and MyGlobalUnlock.

Heterogeneous Arrays

Heterogeneous arrays (ARRAY_HETERO) are a subset of Nested Arrays in that the pointers are all to Symbol Table Entries, that is the low-order bit in the pointer is 0.

Arithmetic Progression Arrays

APAs (ARRAY_APA) are a superset of APVs (Arithmetic Progression Vectors) in that they may be of any Shape and Rank. The header portion is as above. The data portion is defined in datatype.h as follows:

// Define APA structure
typedef struct tagAPLAPA                // Offset + Multiplier × ⍳ NELM (origin-0)
{
    APLINT  Off,                        // 00:  Offset
            Mul;                        // 08:  Multiplier
                                        // 10:  Length
} APLAPA, * LPAPLAPA;

The Multiplier may be 0 as, for example, is produced by the Reshape function of a simple scalar integer.

Type Demotion

All arrays are subject to type Demotion where a pointer to a token containing the array is passed to the TypeDemote (LPTOKEN, UBOOL) function and the Token is then changed. Tokens are defined in tokens.h. The UBOOL parameter specifies whether or not the dimension (1, 2, 4, 8) of the array may demoted (only if the appropriate imaginary parts are zero). This value is FALSE except in very special circumstances.