From Partition Logic to Information Theory

Some basic analogies between subset logic and partition logic

A new logic of partitions has been developed that is dual to ordinary logic when the latter is interpreted as the logic of subsets rather than the logic of propositions. For a finite universe, the logic of subsets gave rise to finite probability theory by assigning to each subset its relative cardinality as a Laplacian probability. The analogous development for the dual logic of partitions gives rise to a notion of logical entropy that is related in a precise manner to Claude Shannon’s entropy. In this manner, the new logic of partitions provides a logical-conceptual foundation for information-theoretic entropy or information content. This post continues the earlier post which introduced some of the basic ideas and operations of partition logic.

A partition \pi = \{B,B',\ldots\} on a universe set U (two or more elements to avoid degeneracy) is a set of non-empty subsets (“blocks”) B,B',\ldots \subseteq U that are disjoint and jointly exhaust U. A partition may equivalently be viewed as an equivalence relation where the equivalence classes are the blocks. A partition \sigma = \{C,C',\ldots \} is refined by a partition \pi = \{B,B',\ldots\} if for every block B \in \pi, there is a block C \in \sigma such that B \subseteq C. The partitions on U are partially ordered by refinement with the minimum partition or bottom being the indiscrete partition 0_{U}=\{U\}, nicknamed the “blob,” consisting of U as a single block, and the maximum partition or top being the discrete partition 1_{U}=\{\{u\}\{u'\},\ldots\} where each block is a singleton. Join and meet operations are easily defined for this partial order so that the partitions on U form a (non-distributive) lattice (NB: in much of the older literature, the “lattice of partitions” is written “upside down” as the opposite lattice). Then the lattice operations of join and meet can be enriched by other partition operations such as negation, implication, and the (Sheffer) stroke or nand to form a partition algebra.

In the duality between subsets and partitions, outlined in an earlier post, the dual of an “element of a subset” is a “distinction of a partition” where an ordered pair (u,u') is a distinction or dit of a partition \pi = \{B,B',\ldots\} if u and u’ are in distinct blocks. In the algebra of all partitions on U, the bottom partition 0_{U} has no dits and the top 1_{U} has all dits [i.e., all pairs (u,u') where u \not= u'] just as in the analogous powerset Boolean algebra on U, the bottom \emptyset has no elements and the top U has all the elements. Let dit(\pi) be the set of distinctions of \pi. The partial order in the BA of subsets is just inclusion of elements and the refinement ordering of partitions is just the inclusion of distinctions, i.e., \sigma \preceq \pi iff dit(\sigma) \subseteq dit(\pi).

Table of analogies Subset concept Partition concept
“Elements” Elements Distinctions (dits)
All “elements” Universe U Discrete partition (all dits)
No “elements” Null set \emptyset Indiscrete partition (no dits)
Object or “event” Subset S\subseteq U Partition \pi on U
“Event” occurs Element u \in S Dit (u,u') distinguished by \pi
Partial order Inclusion of elements Inclusion of dits
Lattice of “events” Lattice of all subsets \mathcal{P}(U) Lattice of all partitions \Pi(U)

Mimicking the development from subset logic to probability theory

With these analogies in hand, we then mimic the development of finite probability theory from subset logic (which goes back to Boole) using the corresponding concepts from partition logic.

For a finite U, the finite (Laplacian) probability p(S) of a subset (“event”) is the ratio: p(S) = |S|/|U|. Analogously, the finite logical entropy (or logical information content) h(\pi) of a partition \pi is the relative size of its dit set compared to the whole “closure space” U\times U:

h(\pi) = \frac{|dit(\pi)|}{|U\times U|}.

If U is an urn with each “ball” in the urn being equiprobable, then p(S) is the probability of an element randomly drawn from the urn being in S, and h(\pi) is the probability that a pair of elements randomly drawn from the urn (with replacement) is a distinction of \pi.

Let \pi = \{B_{1},\ldots,B_{n}\} with p_{i} = |B_{i}|/|U| being the probability of drawing an element of B_{i}. The number of indistinctions (non-distinctions) of \pi is \sum_{i}|B_{i}|^{2} so the number of distinctions is |dit(\pi)|=|U|^{2}-\sum_{i}|B_{i}|^{2} and thus the logical entropy of \pi is:

h(\pi) = \frac{|U|^{2}-\sum_{i}|B_{i}|^{2}}{|U|^{2}}=1-\sum_{i}p_{i}^{2}=\sum_{i}p_{i}(1-p_{i})

since \sum_{i}p_{i}=1.

The table of analogies can be continued.

Table of analogies Subset concept Partition concept
Counting measure (U finite) # elements in subset S # dits in partition \pi
Normalized count P(S)=\frac{|S|}{|U|} h(\pi)=\frac{|dit(\pi)|}{|U\times U|}
Probability interpretation P(S) = prob. random element in S h(\pi) = prob. random pair distinguished by \pi.

In Shannon’s information theory, the entropy H(\pi) of the partition \pi (with the same probabilities assigned to the blocks) is:

H(\pi)=\sum_{i}p_{i}log(1/p_{i})

where the log is base 2.

Each entropy can be seen as the probabilistic average of the “block entropies” where the logical block entropy is h(B_{i}) = 1-p_{i} and the Shannon block entropy is H(B_{i}) = log(1/p_{i}). To interpret the block entropies, consider a special case where p_{i} = 1/2^{n} and every block is the same so there are 2^{n} equal blocks like B_{i} in the partition, e.g., the discrete partition on a set with 2^{n} elements. The logical entropy of that special equal-block partition is the logical block entropy:

\sum_{i}p_{i}(1-p_{i})=(2^{n})p_{i}(1-p_{i})=(2^{n})1/2^{n})(1-p_{i})=h(B_{i}).

Instead of directly counting the distinctions, we could take the number of binary equal-blocked partitions it takes to distinguish all the 2^{n} blocks. As in the game of “twenty questions,” if there is a search for an unknown designated block, then each such binary question reduces the number of blocks by a power of 2 so the minimum number of binary partitions it takes to distinguish all the 2^{n} blocks is:

H(B_{i})=log(1/p_{i})=log(2^{n})=n.

To precisely relate the block entropies, we solve each for p_{i}which is then eliminated to obtain:

h(B)=1-\frac{1}{2^{H(B)}}.

An Example

Consider an example of a set U=\{0,1,2,3,4,5,6,7\} with 2^{3}=8 elements so that the Shannon entropy of 3 is the least number of binary partitions it takes to distinguish all the elements of the set. The effects of the three partitions can be illustrated in the following squares.

8x8-3times

In terms of the 20 questions game, one could think of the first binary question as asking for the leftmost digit in the binary representation of each number. That gives the first partition, \pi_{1}=\{\{0,1,2,3\},\{4,5,6,7\}\} which is represented by the leftmost square. There are 64 squares and the indistinctions or indits of the equipartition of the 8 element set are represented by the 32 shaded squares, while the distinctions or dits of the partition are given by the 32 unshaded squares.

The second binary question asks for the next digit in the binary representation of the number. This yields the second partition \pi_{2}=\{\{0,1,4,5\},\{2,3,6,7\}\} so “joining” the information in the two partitions together gives the join \pi_{1}\lor\pi_{2} represented in the middle square. Then 16 more squares became unshaded, i.e., 16 additional pairs were distinguished by \pi_{2}, for a total of 32 + 16 = 48 dits.

The third binary question asks for the rightmost digit which yields the third partition, \pi_{3}=\{\{0,2,4,6\}.\{1,3,5,7\}\}, which is joined to the other two to create the discrete partition 1_{U} which distinguishes all elements and is represented by the shaded squares on the diagonal in the rightmost square. This partition adds 8 more unshaded squares, i.e., distinguishes 8 more pairs, for a total of 48 + 8 = 56 dits distinguished by the 3 partitions.

Shannon entropy counts the least number of these partitions it takes to distinguish all the elements, H(\pi_{1}\lor \pi_{2} \lor \pi_{3})=H(1_{U})=3, while the logical entropy counts the number of distinctions which are thereby created, i.e., 56, which normalizes to: h(\pi_{1}\lor \pi_{2} \lor \pi_{3})=h(1_{U})=56/64=7/8. In this 2^{n} example, the two entropies stand in the relationship of the block entropies:

h(1_{U})=\frac{7}{8}=1-\frac{1}{2^{3}}=1-\frac{1}{2^{H(1_{U})}}.

The interpretation of the Shannon block entropy is then extended by analogy to the general case where 1/p_{i} is not a power of 2 so that the Shannon entropy H(\pi)=\sum_{i}p_{i}H(B_{i}) is then interpreted as the average number of binary partitions needed to make all the distinctions between the blocks of \pi—whereas the logical entropy is still the relativized count h(\pi)=\sum_{i}p_{i}h(B_{i}) of the distinctions created by the partition \pi.

Concluding remarks

Hence the two notions of entropy boil down to two different ways to count the distinctions of a partition. And thus the concept of a distinction from partition logic provides a logical-conceptual basis for the notion of entropy or information content in information theory. Many of the concepts and relations of Shannon’s information theory, e.g., mutual information, cross entropy, divergence, and the information inequality, can then be developed at the logical level in logical information theory. This and much else is spelled out in: Counting Distinctions: On the Conceptual Foundations of Shannon’s Information Theory. Synthese. 168 (1, May 2009): 119-149, which can be downloaded from this website.

Comments