Multisets: Difference between revisions

From NARS2000
Jump to navigationJump to search
(Created page with "== Definition == In mathematics, an unordered collection of distinct objects is called a [http://en.wikipedia.org/wiki/Set_%28mathematics%29 set]. If duplicates are allowed, the...")
 
 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Definition ==
== Introduction ==
In mathematics, an unordered collection of distinct objects is called a [http://en.wikipedia.org/wiki/Set_%28mathematics%29 set].  If duplicates are allowed, the collection is called a [http://en.wikipedia.org/wiki/Multiset multiset].  The concept of multisets dates from the late 1880s; the term dates from the 1970s.


== Example ==
For decades the several set symbols have languished on APL keyboards either unused or underused.  Sometimes the vendor has assigned no dyadic function to that symbol and sometimes the assigned function isn't very useful.  All vendors have implemented Set Difference (<apll>L~R</apll>), one has implemented Set Union (<apll>L∪R</apll>) and Set Intersection (<apll>L∩R</apll>), and, up to now, no APL vendor has implemented the missing fourth function.


A common example of a multiset is the decomposition of an integer into its prime factors, each of which may occur multiple times.  For example, <apll>600</apll> may be factored into the multiset <apll>2 2 2 3 5 5</apll>, and <apll>2100</apll> into <apll>2 2 3 5 5 7</apll>.
Part of the reason the so-called set functions in APL are in an odd state is that they are defined on sets, but implemented on non-sets.


Two useful properties of a multiset (from which the original multiset may be reconstructed) is the '''Underlying Set of Unique Elements''' along their '''Multiplicities'''. For the two multisets above the two properties are
From Wikipedia, “in computer science, a set is an abstract data structure that can store certain values, without any particular order, and no repeated values.  It is a computer implementation of the mathematical concept of a finite set”.
 
However, APL implementations of the set functions don't enforce the “no repeated value” requirement.  Moreover, allowing repeated values is quite useful giving rise to useful idioms such as <apll>L~' '</apll> to remove all blanks from a vector.
 
Interestingly, sets with repeated values have a long history in mathematics and are known as '''Multisets'''.  The usual set functions have identical counterparts as multiset functions with the same definitions except the multiset version takes into account the multiplicity of the unique values.
 
'''Multisets''' in an APL context are scalars or vectors of arbitrary items with various operations defined on them (monadically), but mostly between them (dyadically).
 
More formally, if <apll>L</apll> and <apll>R</apll> are multisets and the function <apll>m(x,M)</apll> returns the multiplicity of element <apll>x</apll> in the multiset <apll>M</apll>, then
 
* The Union of multisets is the multiset whose unique elements are the unique elements of <apll>L,R</apll> where the multiplicity of element <apll>x</apll> in the result is the larger of <apll>m(x,L)</apll> and <apll>m(x,R)</apll>,
* The Intersection of multisets is the multiset similar to Union, but with larger replaced by smaller, and
* The Difference (also called Asymmetric Difference and Relative Complement) of multisets is similar to Union and Intersection, but where the multiplicity calculation is <apll>max(0,m(x,L)-m(x,R))</apll>, that is if element <apll>x</apll> is in the result if it occurs more in <apll>L</apll> than in <apll>R</apll>, and it occurs with the multiplicity of the difference of the left and right multiplicities.
 
For multisets with no repeated elements, the multiset function and the corresponding set function produce the same results.
 
== Notation ==
 
There are a dozen or so APL primitive functions we'd like to define on Multisets.  One way to do this is to come up with a dozen new symbols to represent those Multiset functions; another is to define a single '''Multiset Operator''' that can be applied to the APL primitives which is the approach taken here, and that symbol is (<apll>⍦</apll>) which can be typed with <b>Alt-'m'</b>, a keystroke previously used for the stile symbol (<apll>|</apll>) which was duplicated elsewhere on the keyboard.  This operator is different from previous operators in APL in that it applies to a select set of primitive functions, but not system functions nor derived functions.  It does, however, apply to user-defined functions via the System Label <apll>⎕MS</apll>.  In that sense it's more like an [http://jsoftware.com/help/dictionary/dict1.htm inflection] such as how J uses <apll>.</apll> and <apll>:</apll>.  Nonetheless, it is a (monadic) operator in the full mathematical (and APL) sense of it taking a function as an operand and returning a (derived) function.
 
== Examples ==
 
A common example of a multiset is the decomposition of an integer into its prime factors, each of which may occur multiple times.  For example, <apll>600</apll> may be factored into the multiset <apll>L←2 2 2 3 5 5</apll>, and <apll>2100</apll> into <apll>R←2 2 3 5 5 7</apll>.
 
Two useful properties of a multiset are the '''Underlying Set of Unique Elements''' (obtained via the usual <apll>∪R</apll>) along with their '''Multiplicities''' (obtained via the derived function <apll>∪⍦R</apll>), where the latter may be defined as <apll>¯2-/⍸1,(2≠/R[⍋R⍳R]),1</apll>.
 
For the two multisets above the two properties are
{|
|-
|<apll>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(∪L),[0.5] ∪⍦L</apll>
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|<apll>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(∪R),[0.5] ∪⍦R</apll>
|-
|<apll> 2 3 5</apll>
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|<apll> 2 3 5 7</apll>
|-
|<apll> 3 1 2</apll>
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|<apll> 2 1 2 1</apll>
|}
 
The values (but not the order) of the original multiset may be reconstructed from <apll>∊(∪⍦R)⍴¨∪R</apll>.  That is, <apll>R≡⍦∊(∪⍦R)⍴¨∪R</apll>, but not necessarily <apll>R≡∊(∪⍦R)⍴¨∪R</apll>.


<table summary="">
  <tr>
    <td><apll> 3 1 2<br /> 2 3 5</apll></td>
    <td valign="top"> &nbsp;&nbsp;and&nbsp;&nbsp; </td>
    <td><apll> 2 1 2 1<br /> 2 3 5 7</apll></td>
  <tr>
</table>
Two common operations performed between multisets are '''Union''' and '''Intersection'''.
Two common operations performed between multisets are '''Union''' and '''Intersection'''.


'''Multiset Union''' on <apll>L</apll> and <apll>R</apll> is defined as the multiset whose underlying set is the set union of the underlying sets of <apll>L</apll> and <apll>R</apll>, and whose multiplicities are the larger of the multiplicities of the corresponding elements of <apll>L</apll> and <apll>R</apll>.
'''Multiset Union''' on <apll>L</apll> and <apll>R</apll> is defined as the Multiset whose underlying set is the set union of the underlying sets of <apll>L</apll> and <apll>R</apll>, and whose multiplicities are the larger of the multiplicities of the corresponding elements of <apll>L</apll> and <apll>R</apll>.


For <apll>L←2 2 2 3 5 5</apll> and <apll>R←2 2 3 5 5 7</apll> the multiset union is <apll>2 2 2 3 5 5 7</apll>.  Note that, for example, there are three <apll>2</apll>s in the result because that is the larger of the number of <apll>2</apll>s in <apll>L</apll> (3) and <apll>R</apll> (2).
For <apll>L←2 2 2 3 5 5</apll> and <apll>R←2 2 3 5 5 7</apll> the Multiset Union is <apll>2 2 2 3 5 5 7</apll>.  Note that, for example, there are three <apll>2</apll>s in the result because that is the larger of the number of <apll>2</apll>s in <apll>L</apll> (3) and <apll>R</apll> (2).


'''Multiset Intersection''' on <apll>L</apll> and <apll>R</apll> is defined the same as for '''Multiset Union''' except that the smaller of the multiplicities is taken instead of the larger.  For the two multisets above, the multiset intersection is <apll>2 2 3 5 5</apll> where there are two <apll>2</apll>s in the result because that's the smaller of the number of <apll>2</apll>s in <apll>L</apll> (3) and <apll>R</apll> (2), and there are no <apll>7</apll>s in the result because that's the smaller of the number of <apll>7</apll>s in <apll>L</apll> (0) and <apll>R</apll> (1).
'''Multiset Intersection''' on <apll>L</apll> and <apll>R</apll> is defined the same as for '''Multiset Union''' except that the smaller of the multiplicities is taken instead of the larger.  For the two Multisets above, the Multiset Intersection is <apll>2 2 3 5 5</apll> where there are two <apll>2</apll>s in the result because that's the smaller of the number of <apll>2</apll>s in <apll>L</apll> (3) and <apll>R</apll> (2), and there are no <apll>7</apll>s in the result because that's the smaller of the number of <apll>7</apll>s in <apll>L</apll> (0) and <apll>R</apll> (1).


Interestingly, in the context of prime factorization, Multiset Union is the direct analogue of '''Least Common Multiple''' and Multiset Intersection is '''Greatest Common Divisor'''.  Using the notation of <apll>∪⍦</apll> for Multiset Union and <apll>∩⍦</apll> for Multiset Intersection, then for the two multisets above,
Interestingly, in the context of prime factorization, Multiset Union is the direct analog of '''Least Common Multiple''' and Multiset Intersection is '''Greatest Common Divisor'''.  For the two Multisets above,


<apll>L∪⍦R ←→ 2 2 2 3 5 5 7</apll> and<br /><apll>L∩⍦R ←→ 2 2 3 5 5</apll>
<apll>L∪⍦R ←→ 2 2 2 3 5 5 7</apll> and<br /><apll>L∩⍦R ←→ 2 2 3 5 5</apll>
Line 29: Line 63:
The Least Common Multiple of two original numbers <apll>×/L ←→ 600</apll> and <apll>×/R ←→ 2100</apll> is
The Least Common Multiple of two original numbers <apll>×/L ←→ 600</apll> and <apll>×/R ←→ 2100</apll> is


<apll>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(×/L)∧×/R<br />
<apll><pre>
4200<br />
      (×/L)∧×/R
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;×/L∪⍦R<br />
4200
4200</apll>
      ×/L∪⍦R
4200</pre></apll>


and the Greatest Common Divisor is
and the Greatest Common Divisor is


<apll>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(×/L)∨×/R<br />
<apll><pre>
300<br />
      (×/L)∨×/R
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;×/L∩⍦R<br />
300
300</apll>
      ×/L∩⍦R
300</pre></apll>


== Notation ==
Moreover, using <apll>P←2 2 2 3 3 5 7</apll> and <apll>Q←2 2 3 5</apll> where as in this case <apll>P</apll> is a superset of <apll>Q</apll> (that is, <apll>P⊇Q</apll>) then '''Multiset Asymmetric Difference''' is the direct analog of integer division.  That is,


We slipped in a symbol in the example above that needs to be explained.  There are a dozen or so APL primitive functions we'd like to define on multisets.  One way is to come up with a dozen new symbols to represent those multiset functions; another is to define a '''Multiset Operator''' that can be applied to the APL primitives which is the approach taken here, and that symbol is (<apll></apll>).  This operator is different from previous operators in APL in that it applies to a select set of primitive functions and no others -- no system functions, not user-defined functions, and no derived functions.  In that sense it's more like a digraph such as <apll>.</apll> and <apll>:</apll> in J.
<apll><pre>
      P~⍦Q
2 3 7
      ×/P~⍦Q
42
      (×/P)÷×/Q
42</pre></apll>


Other examples illustrate the distinction between the multiset and non-multiset functions


<apll><pre>
      'mississippi'~'miss'
pp
      'mississippi'~⍦'miss'
issippi</pre></apll>


== Subscripts ==


One way to understand Multisets and the operations performed on them is to view them with equal elements having unique subscripts.  That is, for the Multisets <apll>2 2 2 3 5 5</apll> and <apll>2 2 3 5 5 7</apll>, write them as
{|
|-
|<apll>2</apll><sub>1</sub>
|<apll>2</apll><sub>2</sub>
|<apll>2</apll><sub>3</sub>
|<apll>3</apll><sub>1</sub>
|<apll>5</apll><sub>1</sub>
|<apll>5</apll><sub>2</sub>
|
|
|and
|-
|<apll>2</apll><sub>1</sub>
|<apll>2</apll><sub>2</sub>
|
|<apll>3</apll><sub>1</sub>
|<apll>5</apll><sub>1</sub>
|<apll>5</apll><sub>2</sub>
|<apll>7</apll><sub>1</sub>
|}
Now Multiset Union reduces to simple union where only one copy of like elements is kept, and similarly for Multiset Intersection and Multiset Asymmetric Difference.  That is,
{|
|-
|<apll>L</apll>
|&nbsp;
|<apll>←</apll>
|&nbsp;
|<apll>2</apll><sub>1</sub>
|<apll>2</apll><sub>2</sub>
|<apll>2</apll><sub>3</sub>
|<apll>3</apll><sub>1</sub>
|<apll>5</apll><sub>1</sub>
|<apll>5</apll><sub>2</sub>
|-
|<apll>R</apll>
|&nbsp;
|<apll>←</apll>
|&nbsp;
|<apll>2</apll><sub>1</sub>
|<apll>2</apll><sub>2</sub>
|
|<apll>3</apll><sub>1</sub>
|<apll>5</apll><sub>1</sub>
|<apll>5</apll><sub>2</sub>
|<apll>7</apll><sub>1</sub>
|-
|<apll>L∪⍦R</apll>
|&nbsp;
|<apll>←→</apll>
|&nbsp;
|<apll>2</apll><sub>1</sub>
|<apll>2</apll><sub>2</sub>
|<apll>2</apll><sub>3</sub>
|<apll>3</apll><sub>1</sub>
|<apll>5</apll><sub>1</sub>
|<apll>5</apll><sub>2</sub>
|<apll>7</apll><sub>1</sub>
|-
|<apll>L∩⍦R</apll>
|&nbsp;
|<apll>←→</apll>
|&nbsp;
|<apll>2</apll><sub>1</sub>
|<apll>2</apll><sub>2</sub>
|
|<apll>3</apll><sub>1</sub>
|<apll>5</apll><sub>1</sub>
|<apll>5</apll><sub>2</sub>
|-
|<apll>L~⍦R</apll>
|&nbsp;
|<apll>←→</apll>
|&nbsp;
|
|
|<apll>2</apll><sub>3</sub>
|}


== Missing Function ==


To see if we have covered all of the possible set/multiset results, look at the usual Venn diagram for two sets – all seven results (excluding the empty set) appear as follows:


{|
|-
|<apll>L∪R</apll>
|&nbsp;&nbsp;&nbsp;
|[[image:Venn123.png|40px]]
|&nbsp;&nbsp;&nbsp;
|Union
|&nbsp;&nbsp;&nbsp;
|<apll>L,R~L</apll>
|-
|<apll>L∩R</apll>
|&nbsp;&nbsp;&nbsp;
|[[image:Venn2.png|40px]]
|&nbsp;&nbsp;&nbsp;
|Intersection
|&nbsp;&nbsp;&nbsp;
|<apll>L~L~R</apll>
|-
|<apll>L~R</apll>
|&nbsp;&nbsp;&nbsp;
|[[image:Venn1.png|40px]]
|&nbsp;&nbsp;&nbsp;
|Asymmetric Difference Left
|&nbsp;&nbsp;&nbsp;
|<apll>L~R</apll>
|-
|<apll>R~L</apll>
|&nbsp;&nbsp;&nbsp;
|[[image:Venn3.png|40px]]
|&nbsp;&nbsp;&nbsp;
|Asymmetric Difference Right
|&nbsp;&nbsp;&nbsp;
|<apll>R~L</apll>
|-
|<apll>L</apll>
|&nbsp;&nbsp;&nbsp;
|[[image:Venn12.png|40px]]
|&nbsp;&nbsp;&nbsp;
|Left
|&nbsp;&nbsp;&nbsp;
|<apll>L</apll>
|-
|<apll>R</apll>
|&nbsp;&nbsp;&nbsp;
|[[image:Venn23.png|40px]]
|&nbsp;&nbsp;&nbsp;
|Right
|&nbsp;&nbsp;&nbsp;
|<apll>R</apll>
|-
|<apll>L∆R</apll>
|&nbsp;&nbsp;&nbsp;
|[[image:Venn13.png|40px]]
|&nbsp;&nbsp;&nbsp;
|Symmetric Difference
|&nbsp;&nbsp;&nbsp;
|<apll>(L~R)∪R~L ←→ (L~R),R~L</apll>
|}


The last diagram shows the missing function along with its name and effect.  The mathematical symbol for this function is delta (<apll>∆</apll>), however because old APL programs use this symbol as another alphabetic character, we use the Section symbol (<apll>§</apll>, <b>Alt-'S'</b>) instead.


== Multiset Derived Functions ==


The APL functions defined on Multisets are as follows:
<table summary="">
  <tr>
    <td><apll>L∪⍦R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td>Multiset Union</td>
  </tr>
  <tr>
    <td><apll>L∩⍦R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td>Multiset Intersection</td>
  </tr>
  <tr>
    <td><apll>L~⍦R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td>Multiset Asymmetric Difference</td>
  </tr>
  <tr>
    <td><apll>L§⍦R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td>Multiset Symmetric Difference</td>
  </tr>
  <tr>
    <td><apll>L⍳⍦R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td>Multiset Index Of</td>
  </tr>
  <tr>
    <td><apll>L∊⍦R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td>Multiset Member Of</td>
  </tr>
  <tr>
    <td><apll>L≡⍦R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td>Multiset Match</td>
  </tr>
  <tr>
    <td><apll>L≢⍦R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td>Multiset Mismatch (same as <apll>~L≡⍦R</apll>)</td>
  </tr>
  <tr>
    <td><apll>L⊂⍦R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td>Multiset Proper Subset Of</td>
  </tr>
  <tr>
    <td><apll>L⊆⍦R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td>Multiset Subset Of</td>
  </tr>
  <tr>
    <td><apll>L⊃⍦R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td>Multiset Proper Superset Of</td>
  </tr>
  <tr>
    <td><apll>L⊇⍦R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td>Multiset Superset Of</td>
  </tr>
  <tr>
    <td><apll>&nbsp;∪⍦R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td>Multiset Multiplicities (<apll>⍴∪⍦R ←→ ⍴∪R</apll>)</td>
  </tr>
</table>
== Multiset Member Of and Index Of ==
The key to understanding the meaning of the Multiset Operator as it is applied to the above APL primitive functions is Multiset Member Of (<apll>∊⍦</apll>), so we'll investigate that first.
Using the subscript approach above the definition is straightforward,
{|
|-
|<apll>L</apll>
|&nbsp;
|<apll>←</apll>
|&nbsp;
|<apll>2</apll><sub>1</sub>
|<apll>2</apll><sub>2</sub>
|<apll>2</apll><sub>3</sub>
|<apll>3</apll><sub>1</sub>
|<apll>5</apll><sub>1</sub>
|<apll>5</apll><sub>2</sub>
|-
|<apll>R</apll>
|&nbsp;
|<apll>←</apll>
|&nbsp;
|<apll>2</apll><sub>1</sub>
|<apll>2</apll><sub>2</sub>
|
|<apll>3</apll><sub>1</sub>
|<apll>5</apll><sub>1</sub>
|<apll>5</apll><sub>2</sub>
|<apll>7</apll><sub>1</sub>
|-
|<apll>L∊⍦R</apll>
|&nbsp;
|<apll>←→</apll>
|&nbsp;
|<apll>1</apll>
|<apll>1</apll>
|<apll>0</apll>
|<apll>1</apll>
|<apll>1</apll>
|<apll>1</apll>
|}
again, matching like elements with like elements between the two arguments.
We'll use this definition many times over in defining how the Multiset Operator applies to various APL primitive functions.


Multisets in an APL context are vectors of arbitrary items with various operations defined on them (monadically), but mostly between them (dyadically). Instead of
In a similar manner, Multiset Index Of can be understood best by writing the two arguments with subscripts.  For a more detailed coverage of this see [http://www.sudleyplace.com/APL/AnatomyOfAnIdiom.ahtml Anatomy of An Idiom].
 
== Multiset Match ==
 
This derived function is a convenient way to determine whether or not two multisets are identical up to but not including order, as in these anagrams:
 
<apll><pre>
      'dynamo'≡⍦'monday'
1
      'pepsicola'≡⍦'episcopal'
1
      'the morse code'≡⍦'here come dots'
1</pre></apll>
 
== APL Definitions ==
 
Each of the above dyadic Multiset functions has a simple analog in a non-Multiset context:
 
<table summary="">
  <tr>
    <td style="font-weight: bold;" valign="bottom">Function<br /><apll>&nbsp;&nbsp;<i>f</i></apll></td>
    <td style="font-weight: bold;" valign="bottom">&nbsp;&nbsp;&nbsp;</td>
    <td style="font-weight: bold;" valign="bottom">Non-Multiset<br />Definition<br /><apll>L <i>f</i> R</apll></td>
    <td style="font-weight: bold;" valign="bottom">&nbsp;&nbsp;&nbsp;</td>
    <td style="font-weight: bold;" valign="bottom">Multiset<br />Definition<br /><apll>L <i>f</i>⍦ R</apll></td>
  </tr>
  <tr>
    <td colspan="5"><hr /></td>
  </tr>
  <tr>
    <td><apll>&nbsp;&nbsp;∪</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>L,R~L</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>L,R~⍦L</apll></td>
  </tr>
  <tr>
    <td><apll>&nbsp;&nbsp;∩</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>(L∊R)/L</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>(L∊⍦R)/L</apll></td>
  </tr>
  <tr>
    <td><apll>&nbsp;&nbsp;~</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>(~L∊R)/L</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>(~L∊⍦R)/L</apll></td>
  </tr>
  <tr>
    <td><apll>&nbsp;&nbsp;§</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>(L~R),R~L</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>(L~⍦R),R~⍦L</apll></td>
  </tr>
  <tr>
    <td><apll>&nbsp;&nbsp;⍳</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>L⍳R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>L⍳⍦R</apll></td>
  </tr>
  <tr>
    <td><apll>&nbsp;&nbsp;∊</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>L∊R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>L∊⍦R</apll></td>
  </tr>
  <tr>
    <td><apll>&nbsp;&nbsp;≡</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>((≢L)≡≢R)∧∧/L∊R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>((≢L)≡≢R)∧∧/L∊⍦R</apll></td>
  </tr>
  <tr>
    <td><apll>&nbsp;&nbsp;≢</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>~L≡R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>~L≡⍦R</apll></td>
  </tr>
  <tr>
    <td><apll>&nbsp;&nbsp;⊂</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>(L⊆R)∧L≢R</apll> &dagger;</td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>(L⊆⍦R)∧L≢⍦R</apll></td>
  </tr>
  <tr>
    <td><apll>&nbsp;&nbsp;⊆</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>∧/L∊R</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>∧/L∊⍦R</apll></td>
  </tr>
  <tr>
    <td><apll>&nbsp;&nbsp;⊃</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>(R⊆L)∧R≢L</apll> &dagger;</td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>(R⊆⍦L)∧R≢⍦L</apll></td>
  </tr>
  <tr>
    <td><apll>&nbsp;&nbsp;⊇</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>∧/R∊L</apll></td> <td>&nbsp;&nbsp;&nbsp;</td> <td><apll>∧/R∊⍦L</apll></td>
  </tr>
</table>
&dagger; = This is the meaning this function would have as a set function if it didn't have another meaning in a non-Multiset context.


Note that in every case the Multiset definition may be obtained from the non-Multiset definition by introducing the Multiset Operator at the appropriate place(s).  Beyond the above explanations for Multiset Member Of and Multiset Index Of, all of the other Multiset definitions can be reduced to the definition of Multiset Member Of.


Moreover, Multiset Union, Intersection, and Symmetric Difference are all symmetric up to order.  That is,


{|
|-
|<apll>(L∪⍦R)≡⍦R∪⍦L</apll>
|-
|<apll>(L∩⍦R)≡⍦R∩⍦L</apll>
|-
|<apll>(L~⍦R)≡⍦R~⍦L</apll>
|}


== Acknowledgments ==
== Acknowledgments ==


The idea of defining Multisets in an APL context is due to Patrick Parks of APL2000, Inc.
The idea of defining Multisets in an APL context is due to Patrick Parks of APL2000, Inc.

Latest revision as of 15:56, 12 January 2019

Introduction

For decades the several set symbols have languished on APL keyboards either unused or underused. Sometimes the vendor has assigned no dyadic function to that symbol and sometimes the assigned function isn't very useful. All vendors have implemented Set Difference (L~R), one has implemented Set Union (L∪R) and Set Intersection (L∩R), and, up to now, no APL vendor has implemented the missing fourth function.

Part of the reason the so-called set functions in APL are in an odd state is that they are defined on sets, but implemented on non-sets.

From Wikipedia, “in computer science, a set is an abstract data structure that can store certain values, without any particular order, and no repeated values. It is a computer implementation of the mathematical concept of a finite set”.

However, APL implementations of the set functions don't enforce the “no repeated value” requirement. Moreover, allowing repeated values is quite useful giving rise to useful idioms such as L~' ' to remove all blanks from a vector.

Interestingly, sets with repeated values have a long history in mathematics and are known as Multisets. The usual set functions have identical counterparts as multiset functions with the same definitions except the multiset version takes into account the multiplicity of the unique values.

Multisets in an APL context are scalars or vectors of arbitrary items with various operations defined on them (monadically), but mostly between them (dyadically).

More formally, if L and R are multisets and the function m(x,M) returns the multiplicity of element x in the multiset M, then

  • The Union of multisets is the multiset whose unique elements are the unique elements of L,R where the multiplicity of element x in the result is the larger of m(x,L) and m(x,R),
  • The Intersection of multisets is the multiset similar to Union, but with larger replaced by smaller, and
  • The Difference (also called Asymmetric Difference and Relative Complement) of multisets is similar to Union and Intersection, but where the multiplicity calculation is max(0,m(x,L)-m(x,R)), that is if element x is in the result if it occurs more in L than in R, and it occurs with the multiplicity of the difference of the left and right multiplicities.

For multisets with no repeated elements, the multiset function and the corresponding set function produce the same results.

Notation

There are a dozen or so APL primitive functions we'd like to define on Multisets. One way to do this is to come up with a dozen new symbols to represent those Multiset functions; another is to define a single Multiset Operator that can be applied to the APL primitives which is the approach taken here, and that symbol is () which can be typed with Alt-'m', a keystroke previously used for the stile symbol (|) which was duplicated elsewhere on the keyboard. This operator is different from previous operators in APL in that it applies to a select set of primitive functions, but not system functions nor derived functions. It does, however, apply to user-defined functions via the System Label ⎕MS. In that sense it's more like an inflection such as how J uses . and :. Nonetheless, it is a (monadic) operator in the full mathematical (and APL) sense of it taking a function as an operand and returning a (derived) function.

Examples

A common example of a multiset is the decomposition of an integer into its prime factors, each of which may occur multiple times. For example, 600 may be factored into the multiset L←2 2 2 3 5 5, and 2100 into R←2 2 3 5 5 7.

Two useful properties of a multiset are the Underlying Set of Unique Elements (obtained via the usual ∪R) along with their Multiplicities (obtained via the derived function ∪⍦R), where the latter may be defined as ¯2-/⍸1,(2≠/R[⍋R⍳R]),1.

For the two multisets above the two properties are

      (∪L),[0.5] ∪⍦L                        (∪R),[0.5] ∪⍦R
2 3 5                  2 3 5 7
3 1 2                  2 1 2 1

The values (but not the order) of the original multiset may be reconstructed from ∊(∪⍦R)⍴¨∪R. That is, R≡⍦∊(∪⍦R)⍴¨∪R, but not necessarily R≡∊(∪⍦R)⍴¨∪R.

Two common operations performed between multisets are Union and Intersection.

Multiset Union on L and R is defined as the Multiset whose underlying set is the set union of the underlying sets of L and R, and whose multiplicities are the larger of the multiplicities of the corresponding elements of L and R.

For L←2 2 2 3 5 5 and R←2 2 3 5 5 7 the Multiset Union is 2 2 2 3 5 5 7. Note that, for example, there are three 2s in the result because that is the larger of the number of 2s in L (3) and R (2).

Multiset Intersection on L and R is defined the same as for Multiset Union except that the smaller of the multiplicities is taken instead of the larger. For the two Multisets above, the Multiset Intersection is 2 2 3 5 5 where there are two 2s in the result because that's the smaller of the number of 2s in L (3) and R (2), and there are no 7s in the result because that's the smaller of the number of 7s in L (0) and R (1).

Interestingly, in the context of prime factorization, Multiset Union is the direct analog of Least Common Multiple and Multiset Intersection is Greatest Common Divisor. For the two Multisets above,

L∪⍦R ←→ 2 2 2 3 5 5 7 and
L∩⍦R ←→ 2 2 3 5 5

The Least Common Multiple of two original numbers ×/L ←→ 600 and ×/R ←→ 2100 is

      (×/L)∧×/R
4200
      ×/L∪⍦R
4200

and the Greatest Common Divisor is

      (×/L)∨×/R
300
      ×/L∩⍦R
300

Moreover, using P←2 2 2 3 3 5 7 and Q←2 2 3 5 where as in this case P is a superset of Q (that is, P⊇Q) then Multiset Asymmetric Difference is the direct analog of integer division. That is,

      P~⍦Q
2 3 7
      ×/P~⍦Q
42
      (×/P)÷×/Q
42

Other examples illustrate the distinction between the multiset and non-multiset functions

      'mississippi'~'miss'
pp
      'mississippi'~⍦'miss'
issippi

Subscripts

One way to understand Multisets and the operations performed on them is to view them with equal elements having unique subscripts. That is, for the Multisets 2 2 2 3 5 5 and 2 2 3 5 5 7, write them as

21 22 23 31 51 52 and
21 22 31 51 52 71

Now Multiset Union reduces to simple union where only one copy of like elements is kept, and similarly for Multiset Intersection and Multiset Asymmetric Difference. That is,

L     21 22 23 31 51 52
R     21 22 31 51 52 71
L∪⍦R   ←→   21 22 23 31 51 52 71
L∩⍦R   ←→   21 22 31 51 52
L~⍦R   ←→   23

Missing Function

To see if we have covered all of the possible set/multiset results, look at the usual Venn diagram for two sets – all seven results (excluding the empty set) appear as follows:

L∪R         Union     L,R~L
L∩R         Intersection     L~L~R
L~R         Asymmetric Difference Left     L~R
R~L         Asymmetric Difference Right     R~L
L         Left     L
R         Right     R
L∆R         Symmetric Difference     (L~R)∪R~L ←→ (L~R),R~L

The last diagram shows the missing function along with its name and effect. The mathematical symbol for this function is delta (), however because old APL programs use this symbol as another alphabetic character, we use the Section symbol (§, Alt-'S') instead.

Multiset Derived Functions

The APL functions defined on Multisets are as follows:

L∪⍦R     Multiset Union
L∩⍦R     Multiset Intersection
L~⍦R     Multiset Asymmetric Difference
L§⍦R     Multiset Symmetric Difference
L⍳⍦R     Multiset Index Of
L∊⍦R     Multiset Member Of
L≡⍦R     Multiset Match
L≢⍦R     Multiset Mismatch (same as ~L≡⍦R)
L⊂⍦R     Multiset Proper Subset Of
L⊆⍦R     Multiset Subset Of
L⊃⍦R     Multiset Proper Superset Of
L⊇⍦R     Multiset Superset Of
 ∪⍦R     Multiset Multiplicities (⍴∪⍦R ←→ ⍴∪R)

Multiset Member Of and Index Of

The key to understanding the meaning of the Multiset Operator as it is applied to the above APL primitive functions is Multiset Member Of (∊⍦), so we'll investigate that first.

Using the subscript approach above the definition is straightforward,

L     21 22 23 31 51 52
R     21 22 31 51 52 71
L∊⍦R   ←→   1 1 0 1 1 1

again, matching like elements with like elements between the two arguments.

We'll use this definition many times over in defining how the Multiset Operator applies to various APL primitive functions.

In a similar manner, Multiset Index Of can be understood best by writing the two arguments with subscripts. For a more detailed coverage of this see Anatomy of An Idiom.

Multiset Match

This derived function is a convenient way to determine whether or not two multisets are identical up to but not including order, as in these anagrams:

      'dynamo'≡⍦'monday'
1
      'pepsicola'≡⍦'episcopal'
1
      'the morse code'≡⍦'here come dots'
1

APL Definitions

Each of the above dyadic Multiset functions has a simple analog in a non-Multiset context:

Function
  f
    Non-Multiset
Definition
L f R
    Multiset
Definition
L f⍦ R

  ∪     L,R~L     L,R~⍦L
  ∩     (L∊R)/L     (L∊⍦R)/L
  ~     (~L∊R)/L     (~L∊⍦R)/L
  §     (L~R),R~L     (L~⍦R),R~⍦L
  ⍳     L⍳R     L⍳⍦R
  ∊     L∊R     L∊⍦R
  ≡     ((≢L)≡≢R)∧∧/L∊R     ((≢L)≡≢R)∧∧/L∊⍦R
  ≢     ~L≡R     ~L≡⍦R
  ⊂     (L⊆R)∧L≢R     (L⊆⍦R)∧L≢⍦R
  ⊆     ∧/L∊R     ∧/L∊⍦R
  ⊃     (R⊆L)∧R≢L     (R⊆⍦L)∧R≢⍦L
  ⊇     ∧/R∊L     ∧/R∊⍦L

† = This is the meaning this function would have as a set function if it didn't have another meaning in a non-Multiset context.

Note that in every case the Multiset definition may be obtained from the non-Multiset definition by introducing the Multiset Operator at the appropriate place(s). Beyond the above explanations for Multiset Member Of and Multiset Index Of, all of the other Multiset definitions can be reduced to the definition of Multiset Member Of.

Moreover, Multiset Union, Intersection, and Symmetric Difference are all symmetric up to order. That is,

(L∪⍦R)≡⍦R∪⍦L
(L∩⍦R)≡⍦R∩⍦L
(L~⍦R)≡⍦R~⍦L

Acknowledgments

The idea of defining Multisets in an APL context is due to Patrick Parks of APL2000, Inc.