Pedro Sobota

LinkedIn

x.com

Contact: pedro@sobo.cx

Property Provider

03/2024


Abstract: This article describes the development of a semantic tool for mapping and relating objects to properties. The user can rank objects based on their properties, discover new objects, and identify gaps in property sets. The tool is designed to be minimal, open-source, cross-platform, and flexible, allowing for various interpretations of property meanings. It could be useful in discussions involving many concepts, aiding in onboarding, identifying edge cases, qualifying concepts, and maintaining structure in documentation. The implementation is in C# 11 and released under the AGPL-3.0.

In the last articles, I had written in the theme of Mathematical topics examined from a computational light, and I've taken them for revision. I don't think I'll put them up again, in revised form, any soon, but one of the topics discussed were combinations of properties of certain abstract structures, considered physical structures in programming and completely abstract structures in Mathematics: for example, sets, tuples, lists, and so on. Upon further consideration, this topic provoked an advancement where I actually mapped a complete small set of combinations of properties of collections, and in the process, I observed I could now objectively rank them according to said set of properties, and discover new objects.

This means a relation is established relating objects to properties, and we implement scoring algorithms ordering the objects with respect to a single object and supply a 'distance' measure between any two objects. We now know the most similar and dissimilar objects with respect to an object. We can, given a set of objects related to properties, check for holes and define missing combinations of properties as new, previously unknown, objects.

In the program presented in this article, these facets are implemented minimally in a convenient and self-contained form that contains concise and robust code, can be compiled easily using an open source compiler, run cross-platform, and carry no external dependencies, while providing functional usability. This is the beginning of a set of tools which progressively expand on the operations available while following the same minimal and canonical focus. More tools with specific applications in mind will also be implemented as needed, to give different forms to access the same functionality, or test applied concepts.

In this approach, the meaning of properties themselves is left for interpretation; the tool is mostly concerned with semantics and knowledge mapping. Variations of property sets or the meaning of properties can be stipulated; each combination is a project. The tool might be useful in occasions where a sizeable number of concepts are in discourse, and, while the subject matter is open for interpretation, a semantic tool helps onboarding, finding edge cases, or qualifying concepts in a measured, controlled way; or keeping structure for documentation of complexity.

The following technical display should give a more precise illustration.

Given the following input file:

    Collections, Multiset
    Collections, Tuple
    Collections, Struct
    Collections, ElementNamedTuple
    Collections, Set
    Collections, Track
    Collections, Array
    Collections, String
    Collections, Column
    
    Properties, NaturalOrdering
    Properties, ElementNaming
    Properties, Uniqueness
    Properties, Typing
    Properties, Naming
    
    // Multiset has no properties
    
    Collections.Tuple , Properties.NaturalOrdering
    Collections.Struct, Properties.ElementNaming
    Collections.Set   , Properties.Uniqueness
    Collections.Track , Properties.Typing
    Collections.String, Properties.Naming
    
    Collections.ElementNamedTuple, Properties.ElementNaming
    Collections.ElementNamedTuple, Properties.NaturalOrdering
    Collections.Array            , Properties.NaturalOrdering
    Collections.Array            , Properties.Typing
    Collections.Column           , Properties.Naming
    Collections.Column           , Properties.NaturalOrdering
    Collections.Column           , Properties.Typing
    

The following results could follow:

    PropertyProvider v0.1
    
    PP> load D:\PropertyProviderConsoleSample.txt
    2 classes constructed
    14 class elements added
    2 relations defined
    24 relation elements defined
    
    PP> help
    exit, outfile, classes, relations, [className],
    [className] v [className]
    [className] v [className] (!|!!|?|??) [classElement]
    
    PP> classes
    Collections, Properties
    
    PP> relations
    (Collections, Properties), (Properties, Collections)
    
    PP> Collections
    Collections.Multiset, Collections.Tuple, Collections.Struct, Collections.ElementNamedTuple, Collections.Set, Collections.Track, Collections.Array, Collections.String, Collections.Column
    
    PP> Properties
    Properties.NaturalOrdering, Properties.ElementNaming, Properties.Uniqueness, Properties.Typing, Properties.Naming
    
    PP> Collections v Properties
    > 
    +---------------------------+-----------------+
    | (Collections, Properties) | 1/5             |
    +---------------------------+-----------------+
    | Multiset                  |                 |
    | Tuple                     |                 |
    | Struct                    | ElementNaming   |
    | ElementNamedTuple         | ElementNaming   |
    | Set                       |                 |
    | Track                     |                 |
    | Array                     |                 |
    | String                    |                 |
    | Column                    |                 |
    +---------------------------+-----------------+
                               < >
    
    > help
    b|break, c|clear, pp, pn, pf, pl, p[pageNumber]
    
    > pn
    +---------------------------+-----------------+
    | (Collections, Properties) | 2/5             |
    +---------------------------+-----------------+
    | Multiset                  |                 |
    | Tuple                     |                 |
    | Struct                    |                 |
    | ElementNamedTuple         |                 |
    | Set                       |                 |
    | Track                     |                 |
    | Array                     |                 |
    | String                    | Naming          |
    | Column                    | Naming          |
    +---------------------------+-----------------+
                               < >
    
    PP> Properties v Collections
    > 
    +---------------------------+-------------------+
    | (Properties, Collections) | 1/9               |
    +---------------------------+-------------------+
    | NaturalOrdering           | Array             |
    | ElementNaming             |                   |
    | Uniqueness                |                   |
    | Typing                    | Array             |
    | Naming                    |                   |
    +---------------------------+-------------------+
                               < >
    
    PP> Properties v Collections
    > pn
    +---------------------------+-------------------+
    | (Properties, Collections) | 2/9               |
    +---------------------------+-------------------+
    | NaturalOrdering           | Column            |
    | ElementNaming             |                   |
    | Uniqueness                |                   |
    | Typing                    | Column            |
    | Naming                    | Column            |
    +---------------------------+-------------------+
                               < >
    
    PP> Collections v Properties ! Set
    > s
    +---------------------------+-----------------+
    | (Collections, Properties) | 1/5             |
    +---------------------------+-----------------+
    | (5) Set                   | ElementNaming   |
    | (4) Multiset              | ElementNaming   |
    | (3) Tuple                 | ElementNaming   |
    | (3) Struct                |                 |
    | (3) Track                 | ElementNaming   |
    | (3) String                | ElementNaming   |
    | (2) ElementNamedTuple     |                 |
    | (2) Array                 | ElementNaming   |
    | (1) Column                | ElementNaming   |
    +---------------------------+-----------------+
                               < >
    
    > pn
    +---------------------------+-----------------+
    | (Collections, Properties) | 2/5             |
    +---------------------------+-----------------+
    | (5) Set                   | Naming          |
    | (4) Multiset              | Naming          |
    | (3) Tuple                 | Naming          |
    | (3) Struct                | Naming          |
    | (3) Track                 | Naming          |
    | (3) String                |                 |
    | (2) ElementNamedTuple     | Naming          |
    | (2) Array                 | Naming          |
    | (1) Column                |                 |
    +---------------------------+-----------------+
                               < >
    
    PP> Properties v Collections !! Uniqueness
    > s
    +---------------------------+-------------------+
    | (Properties, Collections) | 1/9               |
    +---------------------------+-------------------+
    | (9) Uniqueness            |                   |
    | (6) ElementNaming         |                   |
    | (6) Naming                |                   |
    | (5) Typing                | Array             |
    | (4) NaturalOrdering       | Array             |
    +---------------------------+-------------------+
                               < >
    
    PP> Properties v Collections ! Uniqueness
    > s
    +---------------------------+-------------------+
    | (Properties, Collections) | 1/9               |
    +---------------------------+-------------------+
    | (9) Uniqueness            | Array             |
    | (6) ElementNaming         | Array             |
    | (6) Naming                | Array             |
    | (5) Typing                |                   |
    | (4) NaturalOrdering       |                   |
    +---------------------------+-------------------+
                               < >
    
    PP> Collections v Properties ? Array
    > s
    +---------------------------+-----------------+
    | (Collections, Properties) | 1/5             |
    +---------------------------+-----------------+
    | (3) Struct                |                 |
    | (3) Set                   | ElementNaming   |
    | (3) String                | ElementNaming   |
    | (2) Multiset              | ElementNaming   |
    | (2) ElementNamedTuple     |                 |
    | (1) Tuple                 | ElementNaming   |
    | (1) Track                 | ElementNaming   |
    | (1) Column                | ElementNaming   |
    | (0) Array                 | ElementNaming   |
    +---------------------------+-----------------+
                               < >
                               
    > save
    %LOCALAPPDATA%\PropertyProvider\PropertyProviderOut.txt
    
    +---------------------------+-----------------+-----------------+-----------------+-----------------+-----------------+
    | (Collections, Properties) | 1/5             | 2/5             | 3/5             | 4/5             | 5/5             |
    +---------------------------+-----------------+-----------------+-----------------+-----------------+-----------------+
    | (3) Struct                |                 | Naming          |                 |                 | Uniqueness      | 
    | (3) Set                   | ElementNaming   | Naming          |                 |                 |                 | 
    | (3) String                | ElementNaming   |                 |                 |                 | Uniqueness      | 
    | (2) Multiset              | ElementNaming   | Naming          |                 |                 | Uniqueness      | 
    | (2) ElementNamedTuple     |                 | Naming          | NaturalOrdering |                 | Uniqueness      | 
    | (1) Tuple                 | ElementNaming   | Naming          | NaturalOrdering |                 | Uniqueness      | 
    | (1) Track                 | ElementNaming   | Naming          |                 | Typing          | Uniqueness      | 
    | (1) Column                | ElementNaming   |                 | NaturalOrdering | Typing          | Uniqueness      | 
    | (0) Array                 | ElementNaming   | Naming          | NaturalOrdering | Typing          | Uniqueness      | 
    +---------------------------+-----------------+-----------------+-----------------+-----------------+-----------------+
    

Observations

  • A project can contain any number of classes and relations. The "Properties" classes is not hardcoded and is just a "default" class name for properties of objects. Any classes of objects could be related
  • Tables are paged to allow a constant width for inspection. Full tables can be printed to files with save
  • Reverse relations are automatically created from any relation
  • The A v B ! C syntax, where A is the "key" class and B is the "value" class, performs similarity scoring (non-unique) in the relation with respect to element C of the key class. The descending similarity score is shown in parenthesis on the key cell
  • Dissimilarity scoring is also available with the ? operator
  • The double operators !! and ?? for scoring shows true or false values (which we call "characteristics"), as opposed to equal or distinct values (for similarity or dissimilarity respectively), for each key class
  • The project abstracts class element names as Symbols, which allows to:
    • Enforce uniqueness of Symbols on a class
    • Implement integer encoding of Symbols, as 1) combination of elements on a Set is expressible as a sum, 2) powers-of-two sums are unique, and 3) classes are Sets. Thus, element combinations (or Objects) are encoded as unique integers in a way such that, from an integer, the combination (or Object) is recoverable.

Implementation

The operations have been implemented in a command line program written in C# 11 and released under the AGPL-3.0.

An exponentiation-based list hash

03/2019


Abstract: This article presents a mathematical algorithm to represent a list of real numbers as a single number using an exponentiation-based accumulation. We discuss the limitations of traditional statistical reductions like mean and standard deviation, and propose a non-commutative operation, exponentiation, to address these issues. The article also addresses challenges such as magnitude manipulations, leading zeros, and negative inputs, and proposes solutions to maintain the integrity of the original data. The algorithm's execution falls in the category of linear algorithms in time, making it efficient for medium to large lists, while being constant in space.

In this post, I'll describe an algorithm to represent real number lists by a single number.

First, we'll note mean and standard deviation from statistics are reductions commonly used to indicate properties about data expressed as lists of numbers.

The mean statistic, as a property of such list of numbers, is opaque to a modification such that one value is increased in the list while another value is decreased by the same amount. When this modification is performed, the resulting value is not modified. For example,

$$\displaylines{ mean [7.5, 2.98, 6.72, 0.89, 3.74, 2.87] = \\ mean [6.5, 3.98, 6.72, 0.89, 3.74, 2.87]. }$$

Obviously, for the purpose at hand, this would be a serious downside. On this point, the standard deviation is an improvement. That statistic's value is different for the same change:

$$\displaylines{ stddev [7.5, 2.98, 6.72, 0.89, 3.74, 2.87] = 2.5149, \\ stddev [6.5, 3.98, 6.72, 0.89, 3.74, 2.87] = 2.21737. }$$

The standard deviation, however, is susceptible to a value swap between two numbers: exchanging the position of one value with the position of another one has no effect on the number list's dispersion or its mean value, either.

As an example, considering

$$\displaylines{ A=[7.5, 2.98, 6.72, 0.89, 3.74, 2.87],\\ B=[2.98, 7.5, 6.72, 0.89, 3.74, 2.87] }$$

(elements 1-2 swapped),

$$\displaylines{ stddev(A) = stddev(B), \\ mean(A) = mean(B). }$$

The common characteristic of those two algorithms over the list of numbers is that they aggregate using mathematically commutative operations: the order of the numbers presented to the operation does not influence its result. We'll examine the possibility of applying a operation which is not commutative to the same problem.

Ordering

One of the first mathematical operations to suggest itself is exponentiation.

$$\displaylines{ 2 ^ 3 \neq 3 ^ 2, \\ \left(2 ^ 3\right) ^ 4 \neq 2 ^ \left(3 ^ 4\right). }$$

(Note that

$$\displaylines{ 2^4=4^2, }$$

but these are the only pair of natural numbers which commute under exponentiation [1].)

Drawing this operation of consecutive exponentation with the symbol \(\bigwedge\), and each exponentiation with \(\wedge\), now,

$$\displaylines{ \bigwedge [7.5, 2.98, 6.72, 0.89, 3.74, 2.87] = \\ 7.5 \wedge 2.98 \wedge 6.72 \wedge 0.89 \wedge 3.34 \wedge 2.87 = \\ 436.491 }$$

and

$$\displaylines{ \bigwedge [2.98, 7.5, 6.72, 0.89, 3.74, 2.87] = \\ 4348.3, }$$

such that the ordering requirement is now satisfied.

Sizing

As the ordering problem is solved, another immediate problem arises, that the exponentiated numbers grow too fast. We examine the case of \([5,6,4.6]\), which has a magnitude (measured in multiples of 10) of \(10 ^{2654}\).

We'll reduce the magnitude of every number to \(1\), by extracting the decimal exponent and dividing the number by it.

Now,

$$\displaylines{ \bigwedge [5, 6, 4006] = \\ \left(5 \times \frac{1}{10^1}\right) \wedge \left(6 \times \frac{1}{10^1}\right) \wedge \left(4006 \times \frac{1}{10^4}\right) = \\ 0.5 \wedge 0.6 \wedge 0.4006 = \\ 0.568431. }$$

Drawing "normalized" numbers as \(\overline{x}\),

$$\displaylines{ \overline{5} \wedge \overline{6} \wedge \overline{4006} = 0.568431. }$$

The result, now, sits within a controlled range between \(0\) and \(1\).

Corner cases

Some corner cases present themselves. The normalization opens up the possibility of magnitude manipulations (since the magnitude is taken out of every number). Thus changes like:

$$\displaylines{\bigwedge [50,.6,46,0.63,.0005,4300,52] = 0.596399\text{,}}$$ $$\displaylines{\bigwedge [50,6,4.6,6.3,5,4.3,5.2] = 0.596399}$$

yield equal results.

However, those same changes also change the mean value in the list:

$$\displaylines{ mean [5,6,4.6,6.3,5,4.3,5.2] = 5.2, \\ mean [50,6,4.6,6.3,5,4.3,5.2] = 11.6286, }$$

We'll prepend the mean to the list; the mean will end up used as the first exponent.

$$\displaylines{ \bigwedge [5,6,4.6,6.3,5,4.3,5.2] = \\ \overline{5} \wedge (\overline{6} \wedge (\overline{4.6} \wedge (\overline{6.3} \wedge (\overline{5} \wedge (\overline{4.3} \wedge (\overline{5.2} \wedge mean)))))) = \\ \overline{5} \wedge (\overline{6} \wedge (\overline{4.6} \wedge (\overline{6.3} \wedge (\overline{5} \wedge (\overline{4.3} \wedge (\overline{5.2} \wedge \overline{5.2})))))) = \\ 0.595317 \\ \neq \\ \bigwedge [50,6,4.6,6.3,5,4.3,5.2] = \\ \overline{11.6286} \wedge (\overline{50} \wedge (\overline{6} \wedge (\overline{4.6} \wedge (\overline{6.3} \wedge (\overline{5} \wedge (\overline{4.3} \wedge (\overline{5.2}))))))) = \\ 0.595047. }$$

Now, because the mean is normalized like any other value in the list, a global multiplication of the list by factors of 10 will produce the same results. This case, though, is not fixed, because the manipulation required (multiplying every number by orders of 10) is considered too harmful to the integrity of the original data.

Exponentiation as the chosen operation presents relevant properties.

First, any base of zero in the list will "throw back" the accumulated result to zero; this is not considered a serious issue, but another more serious one is that, in the case the list possesses a leading zero, the final result will be invariably zero, because the zero will be the last exponent to be calculated.

Both problems can be eliminated by substituting zeroes for the normalized mean value, with no detectable side effects. This guarantees, also, no stabilization of value between zeroes (that the resulting value will keep constant as long as there are zeroes in the input).

A special case of this condition is that one of an all-zero list: it yields the \(0^0\) exponentiation, halting the algorithm. This is not possible to avoid without identifying such list element-by-element. In this implementation, this is done beforehand, and zero is returned.

Another possible halting condition is pertaining to negative inputs. Of course, exponentiation by a negative number results in a fraction which, used as exponent for the next number in the list, which could be negative, could yield calculation of an even root over a negative number, producing a complex number.

The trivial solution is taken, of taking absolutes of the inputs, as changes of signal will change the mean as well. The mean is used here again as a help.

Another possible manipulation scenario is that of a swap of two values between themselves which are multiples of ten between themselves. This combines the swapping and magnitude reduction manipulations. This is the second problem we choose to not fix, by hypothesising a low desireability.

Finally, the operation of consecutive exponentiation is susceptible, at least theoretically, to floating point collisions, because of rounding. The probability of such a collision happening is assumed to be low enough for common use cases.

Summary

This number should uniquely represent a list of numbers, except for the following manipulations:

  • Changes of all numbers by factors of 10;
  • Swaps of numbers which are multiples of 10 between themselves.

Visualization

In figures 1-2, the sucessive calculated value can be seen in random number list. The result obtained is the last point of the lines.

In figures 3-4, we see the evolution of final results generated over sequences of lists of mostly equidistant and equidistant numbers, respectively, coming from real and synthetic data (incremental entity ids). These are used to try to show possibilities of collisions in the \(y\)-axis, although there are not necessarily collisions in sections of the lines sharing the same \(y\) intervals.

Fig. 1: successive calculation over a
random 10-sized list
Fig. 2: successive calculation over a
random 100-sized list
Fig. 3: final results for 1000 lists
of mostly equidistant numbers
Fig. 4: final results for 1000
lists of equidistant numbers

Benchmarking

The algorithm execution involves one pass over the entire list, for the result calculation, and two more for the mean and standard deviation calculation. It falls within the category of a linear algorithm.

Platform: Ryzen 5 5600G, 16,00 GB, Windows 10.

Random list size 100.000 1.000.000 10.000.000 100.000.000 1.000.000.000
Time 0.009 s 0.092 s 0.921 s 9.235 s 93.572 s

Implementations

References

  1. Structure of integer pairs which commute under exponentiation

Computing layered intervals

04/2018


Abstract: This article introduces a linear algorithm for calculating a final set of intervals with a layered set of intervals as input, addressing the challenge of accounting for obstructions in higher level layers. A pseudocode implementation and an accompanying C# implementation with tests are included.

This page describes an algorithm created to calculate a final set of intervals with a set of layered intervals as input linearly (i.e., from left to right).









The above images are examples of composed intervals (the lines are the paths taken by the algorithm).

As you can see, the main obstacle in calculating such composed intervals is accounting for obstructions in more important (i.e., higher priority) layers.

This is the simplest method to perform this computation in a linear way I could come up with.

For a tree approach, see [1].

Pseudocode

    from first to last node, ordered by position first and ascending lane second:
    if not closing:
        if not open in any higher lane:
            if final opened:
                close final.
            open final.
        open in this lane.
    if closing:
        closed in this lane.
        if final opened:
            if not open in any higher lane:
                close final.
    if not final opened:
        if at last lane of position, except for the last position:
            if open in any other lane:
                open final.

Implementations

With tests.

References

  1. Interval tree (Wikipedia)

Lattice big-integer multiplication

01/2018


Abstract: This article presents an implementation of the lattice multiplication method, an ancient technique for multiplying large numbers, in C#. The basic idea is that numbers can be multiplied digit-by-digit and, as each digit only ranges from 0 to 9, the maximum multiplication performed per step is 9 x 9. The implementation, capable of computing 4999! (with 16322 digits) in around one minute, uses a cell-walking routine to traverse arbitrary shape lattices in three distinct phases. The article also includes pseudocode implementations and references.

As we know, not very large numbers (such as 99) have very large factorials, with this one having 156 decimal places.

The problem with multiplying those numbers is the common problem of overflow with large number computations.

For example, 29! fails to compute with ulongs in C#, producing 11390785281054474240 instead of 8841761993739701954543616000000.

The lattice or gelosia method here implemented is an ancient multiplication method with the idea that numbers can be multiplied digit-by-digit and, as each digit only ranges from 0 to 9, the maximum multiplication performed per step is 9 x 9.

The method is executed by laying out first the digits of the multiplicands across the borders of a table. The multiplied digits are read in diagonal lines and summed, and these summed digits become the result of the multiplication.


I penned a particular implementation of the calculation of the lattice method after inspiration through [1]. (The cell walking routine is also implemented in the above graphic in JavaScript.)

This particular implementation can compute 4999! (which has 16322 digits) in 1m24s. It works by walking cells of arbitrary shape (width x height) lattices (the grids shown above).

The basic procedure can be divided in the following kinds of steps:

1) Basic cell-to-cell walk

2) Line changes

3) Phase changes (including the end of the walk)

There are three phases.

In any phase, the basic cell walk routine is the same: if we are in an odd cell (the "uppermost" triangle in a square), we go to the left square to the even cell. Furthermore, if we are in an even cell (the "lowermost" triangle in the square), we go down one square and switch to the odd cell.

If we reach the lower cells in the first phase, in any shape lattices, or in the second phase in horizontal lattices, a line change occurs. If we reach the leftmost cells in the third phase of any shape lattice or in the second phase of vertical lattices, a line change also occurs.

The first and third phases are common to all shape lattices, and the second phase (only occurring in non-square lattices) is of different line-changing procedure with different orientation lattices.

In any phase, when changing lines, we look for when the phase changes. According to the orientation, the phase transition can jump an additional line, before changing phases, or not.

Pseudocode

    let d1 be the amount of digits in the multiplicator, and 
    d2 be the amount of digits in the multiplicand.

    do:
        multiply the digits at x and y.
        add to the sum.
        orientation undefined.

        if end of line of any phase:
            if carrying digit, add it to sum.
            if one digit sum:
                take the only digit.
            if two digits sum:
                take the second digit and carry the first.
            result position less one.
            reset the sum.
        
        if end of line of phase 1:
            if x is at d1 + 1 - d2 (the "square" position):
                the lattice is horizontal.
            if x is at 1:
                the lattice is vertical.
            if the lattice is both horizontal and vertical:
                the lattice is square.

            if orientation is defined and:
                it's square:
                    phase is 3.
                    jump cell like the new phase, but 
                    x is 1 and odd toggles to true.
                it's horizontal:
                    phase is 2.
                    jump cell like the new phase.
                it's vertical:
                    phase is 2.
                    jump cell normally.
            if not:
                jump cell normally.
        if end of line phase 2 vertical:
            if y is d1 + 1:
                phase is 3.
                jump cell like the new phase.
            if not:
                jump cell normally.
        if end of line phase 2 horizontal:
            if x is 1:
                phase is 3.
            jump cell normally.
        if end of line phase 3:
            if y is 1:
                end of walk.
            if not:
                jump cell normally.
        if no end of line:
            if not odd:
                y is increased 1.
            if odd:
                x is decreased 1.
            toggle odd.

Implementations

With tests.

References

  1. Homework: how to write own multiplication of big numbers?, Stack Overflow