When what if is if what

I created my first data mining model back in 2005. This was a basket analysis in which we determined which products are commonly found together (numerous connection) and which are almost never found apart from each other (strong connection). We used this to rearrange the shelves in five stores, putting the numerously connected products in corners of the store, driving as many people past other shelves as possible. Strongly connected products were kept in the same or adjoining shelves. This increased the upsell of about 30%, so shortly after the evaluation, all 900 stores were rebuilt according to the new layout.

Ever since then, I’ve been in awe of what data mining, now often referred to as machine learning, can do. I am sure many of you have employed and maybe even operationally use such models. Having continued to use them to score customers, from churn likelihood to cross sales potential to probability of accepting an offer and to many other things, I sat down in 2013 and wondered what the next step could be. Here I was at a company with several well working models, but even though they were labeled as “predictive”, none could acutally tell me much about the future. So I started thinking; What if there was a way to hook up the models to the cash flow?

As it turns out, there was a way. So far, the models had been used to more or less categorize customers, such as in “potential churners” and “loyal customers”, based on some threshold of the probability to churn. However, behind the strict categorization are individual probabilities for churn. Even loyal customers have a probability to churn, albeit a low one. The first realization was that if we were to use the models more intelligently, we would have to give up bagging, boosting, and any other methods that may distort the probability distributions (back then there were no way to calibrate the resulting probabilities). We needed probabilities as close to the actual ones as possible. Among customers with a predicted 2% likelihood to churn in a year, after a year the outcome should be that 2% of them have churned.

Interestingly, this meant that the classification accuracy of the model went down, but it was more true to reality when looking at the population as a whole. With those individual and realistic probabilities in place, the next step was to use them to build a crystal ball, so that we could look into the future. I devised a game theoretical model, into which I could pour individuals and their probabilities for events happening in a given time frame, and it would return for which individuals these events actually happened. Iterate, and I could predict the same for the next time frame, and the next, and the next… We’ll call this a simulation.

This is where randomness comes into play. There is no way to say for sure which of the customers with a low 2% probability to churn that actually will churn. That would just be too good. The game theoretical model will, however, spit out the correct number of such churners, with respect taken to all the other customers and their individual probabilities. Because of that, it works well on different aggregated levels, but as you increase the granularity the results will become more stochastic. In order to get monetary results, the simulation was extended to take revenue and costs into account and a whole number of other things that could be calculated using traditional business logic. Apart from customer outflow through churn, there is also customer inflow. These were modeled as digital twins of existing customers. With all this in place, it was for the first time possible to forecast the revenue, among all the other things, far into the future.

Running several simulations, with different random numbers, will actually tell you if your business is volatile or stable. Hopefully, the results from using different random numbers will not differ much, indicating that your business is stable. In reality, there is no perfectly stable business though. In one simulation your very best customer may churn early, whereas in another the same customer stays until the end. Even if the difference on the bottom line is slight, such a difference impairs comparability between simulations. The solution, provided that your business is quite stable, is still to use random numbers, but such that remain fixed between simulations.

So, if you have a well working crystal ball, why would there be a need to do more than one simulation? Well, right now, the crystal ball has about one hundred thousand parameters; knobs that you can turn. Almost all of these are statistically determined, and a few are manually entered, but many are very interesting to fiddle with. Simulations are perfect to use when you want to do what-if analysis. Run a baseline simulation, based on the most likely future scenario, then twist some knobs, run again, and compare. This can also be used to get an idea of how sensitive your business is to a twist and which knobs matter the most.

I’ve run baselines, worst-case, best-case, different pricing, higher and lower churn, more or less inflow, changed demographics, stock market crashes, lost products, new products, possible regulations, and so forth, during the last six years with this simulation engine. All with more than fifty different measures forecasted, many monetary, to the celebration of management. Simulations replaced budgeting, simulations stress test the business on a yearly basis, simulations are used to price products, simulations are used to calculate ROI, simulations are used every time something unexpected happens in the market, and above all simulations have this company prepared.

We have turned “what-if” into “if-what” — action plans of “what” to do should the “if” come to pass. I believe this is the natural next step for all of you doing machine learning now, but who have not yet enriched it using game theoretical simulations. In all honesty, I am a bit perplexed why I haven’t heard of anyone else doing this yet. Amazon recently showed off some new forecasting engine, so maybe simulations will become more mainstream. On a side note, predicting 50 forecast units 30 periods into the future for 10 million entities, which is what we frequently do, will with Amazon’s pricing cost 50 * 30 * 10000000 / 1000 * $0.60 = $9 million per simulation. This alone is more than the cost of the entire simulation engine over its six year lifetime so far.

If you want to know more about simulations, don’t hesistate to contact me. You can also read more on the homepage at http://www.uptochange.com. Up to Change is also sponsoring work on Anchor modeling.

She wore a blue dress

This is an article about imprecision and uncertainty, two in general poorly understood and often mixed up concepts. It’s also about information, which I will define as saying something about something else¹. Information is the medium we use to convey and invoke a sense of that else; sharing our perception of it. The funny thing is, when we say something about something else, many things about the else will always get lost in translation. Information is, therefore, always imprecise and uncertain to some degree. What is perplexing, and less funny, is how we often tend to forget this and treat information as facts.

I think we have a desire to believe that information is precise and certain. The stronger the desire, the greater the willingness to interpret it as facts. Take Günther Schabowski as an example. When he, although uncertain, quite precisely stated that “As far as I know [the new regulations are] effective immediately, without delay.” Those new regulations were intended to be temporary travel regulations with relaxed requirements, limited to a select number of East Germans. This later on the same day led to the fall of the Berlin wall and eventually contributed to the end of the cold war, if we are to believe Wikipedia. Even small words from the right mouths can have large consequences.

Now, in order to get a better understanding of imprecision and uncertainty, let us look at the statement 𝕊𝕙𝕖 𝕨𝕠𝕣𝕖 𝕒 𝕓𝕝𝕦𝕖 𝕕𝕣𝕖𝕤𝕤 in conjunction with the following photo.

First, we assume that whoever 𝕊𝕙𝕖 is referring to is agreed upon by everyone reading the statement. Let’s say it’s the woman in the center with the halterneck dress. Then 𝕨𝕠𝕣𝕖 is in the preterite tense, indicating that the occasion on which she wore the dress has come to pass. In its current form, this is highly imprecise, since all we can deduce is that it has happened, sometime in the past.

Her dress looks 𝕓𝕝𝕦𝕖, but so do many of the other dresses. If they are also 𝕓𝕝𝕦𝕖 we must conclude that 𝕓𝕝𝕦𝕖 is imprecise enough to cover different variations. One may also ask if her dress will remain the same colour forever? I am probably not the only one to have found a disastrous red sock in the (once) white wash. No, the imprecise colour 𝕓𝕝𝕦𝕖 is bound to that imprecise moment the statement is referring to. To make things worse, no piece of clothing is perfectly evenly coloured, but this dress is at least in general 𝕓𝕝𝕦𝕖.

Finally, it’s a 𝕕𝕣𝕖𝕤𝕤, but there are an infinite number of ways to make a 𝕕𝕣𝕖𝕤𝕤. Regardless of how well the manufacturing runs, no two dresses come out exactly the same. The 𝕕𝕣𝕖𝕤𝕤 she wore is a unique instance, but then it also wears and tears. Maybe she has taken it to a tailor since, and it is now a completely different type of garment. In other words, what it means to be a 𝕕𝕣𝕖𝕤𝕤 is imprecise and what the 𝕕𝕣𝕖𝕤𝕤 actually looked like is imprecisely bound in time by the statement.

In fact, 𝕊𝕙𝕖 𝕨𝕠𝕣𝕖 𝕒 𝕓𝕝𝕦𝕖 𝕕𝕣𝕖𝕤𝕤 would have worked just as well in conjunction with any of the women in the photo². Me picking one for the sake of argument had you focusing on her, but in reality, the statement is so imprecise it could apply just as well to anyone. Imprecise information is such that it applies to a range of things. 𝕊𝕙𝕖 ranges over all females, 𝕨𝕠𝕣𝕖 ranges from now into the past, 𝕓𝕝𝕦𝕖 ranges over a spectrum of colours, 𝕕𝕣𝕖𝕤𝕤 ranges over a plethora of garments. 𝕊𝕙𝕖 𝕨𝕠𝕣𝕖 𝕒 𝕓𝕝𝕦𝕖 𝕕𝕣𝕖𝕤𝕤, taken combined increases the precision, since not every woman in the world has worn a blue dress. Together with context, such as the photo, the precision can even be drastically increased.

With a better understanding of imprecision, let’s look at the statement anew and how: 𝗔𝗿𝗰𝗵𝗶𝗲 𝘁𝗵𝗶𝗻𝗸𝘀 𝕊𝕙𝕖 𝕨𝕠𝕣𝕖 𝕒 𝕓𝕝𝕦𝕖 𝕕𝕣𝕖𝕤𝕤. Regardless of its imprecision, 𝗔𝗿𝗰𝗵𝗶𝗲 is not certain that the statement is true. The word 𝘁𝗵𝗶𝗻𝗸𝘀 quantifies his uncertainty, which is less sure than 𝗰𝗲𝗿𝘁𝗮𝗶𝗻, as in: 𝗗𝗼𝗻𝗻𝗮 𝗶𝘀 𝗰𝗲𝗿𝘁𝗮𝗶𝗻 𝕊𝕙𝕖 𝕨𝕠𝕣𝕖 𝕒 𝕓𝕝𝕦𝕖 𝕕𝕣𝕖𝕤𝕤. Maybe 𝗗𝗼𝗻𝗻𝗮 wore the dress herself, which is why her opinion is different. Actually, 𝗔𝗿𝗰𝗵𝗶𝗲 𝘁𝗵𝗶𝗻𝗸𝘀 𝕊𝕙𝕖 𝕨𝕠𝕣𝕖 𝕒 𝕓𝕝𝕦𝕖 𝕕𝕣𝕖𝕤𝕤, 𝗯𝘂𝘁 𝗶𝘁 𝗺𝗮𝘆 𝗵𝗮𝘃𝗲 𝗯𝗲𝗲𝗻 𝘁𝗵𝗲 𝗰𝗮𝘀𝗲 𝘁𝗵𝗮𝘁 𝕊𝕙𝕖 𝕨𝕠𝕣𝕖 𝕒 𝕡𝕚𝕟𝕜 𝕕𝕣𝕖𝕤𝕤. From this, we can see that uncertainty is both subjective and relative a particular statement, since 𝗔𝗿𝗰𝗵𝗶𝗲 now has opinions about two possible, but mutually exclusive, statements. These are, however, only mutually exclusive if we assume that he is talking about the same occasion, which we cannot know for sure.

Somewhat more formally, uncertainty consists of subjective probabilistic opinions about imprecise statements. Paradoxically, increasing the precision may make someone less certain, such as in: 𝗔𝗿𝗰𝗵𝗶𝗲 𝗶𝘀 𝗻𝗼𝘁 𝘀𝗼 𝘀𝘂𝗿𝗲 𝘁𝗵𝗮𝘁 𝔻𝕠𝕟𝕟𝕒 𝕨𝕠𝕣𝕖 𝕒 𝕟𝕒𝕧𝕪 𝕓𝕝𝕦𝕖 𝕙𝕒𝕝𝕥𝕖𝕣𝕟𝕖𝕔𝕜 𝕕𝕣𝕖𝕤𝕤 𝕥𝕠 𝕙𝕖𝕣 𝕡𝕣𝕠𝕞. This hints that there may be a need for some imprecision in order to maintain an acceptable level of certainty towards the statements we make. It is almost as if this is an information theoretical analog to the uncertainty principle in quantum mechanics.

But is this important? Well, let me tell you that there are a number of companies out there that claim to use statistical methods, machine learning, or some other fancy artificial intelligence³, in order to provide you with must-have business-leading thingamajigs. Trust me that a large portion of them are selling you the production of 𝕊𝕙𝕖 𝕨𝕠𝕣𝕖 𝕒 𝕓𝕝𝕦𝕖 𝕕𝕣𝕖𝕤𝕤-type of statements rather than fact-machines. Imprecise results, towards which uncertainty can be held. Such companies fall into four categories:

  • Those that do not know they aren’t selling facts.
    [stupid]
  • Those that know they aren’t selling facts, but say they do anyway.
    [deceptive]
  • Those that say they aren’t selling facts, but cannot say why.
    [honest]
  • Those that say they aren’t selling facts, and tell you exactly why.
    [smart]

Unfortunately I’ve met very few smart companies. Thankfully, there are some honest companies, but there is also an abundance of stupid and deceptive companies. Next time, put them to the test. Never buy anything that doesn’t come with a specified margin of error, a confusion matrix, or some other measure indicating the imprecision. If the thingamajig is predicting something, make sure it tells you how certain it is of those predictions, then evaluate these against actual outcomes and form your own opinion as well.

Above all, do not take information for granted. Always apply critical thinking and evaluate its imprecision and the certainty with which and by whom it is stated.

¹ 𝘐𝘯𝘧𝘰𝘳𝘮𝘢𝘵𝘪𝘰𝘯 𝘵𝘩𝘢𝘵 𝘵𝘢𝘭𝘬𝘴 𝘢𝘣𝘰𝘶𝘵 𝘪𝘵𝘴𝘦𝘭𝘧 𝘪𝘴 𝘶𝘴𝘶𝘢𝘭𝘭𝘺 𝘤𝘢𝘭𝘭𝘦𝘥 𝘮𝘦𝘵𝘢-𝘪𝘯𝘧𝘰𝘳𝘮𝘢𝘵𝘪𝘰𝘯.

² 𝘈𝘵 𝘭𝘦𝘢𝘴𝘵 𝘧𝘰𝘳 𝘴𝘰𝘮𝘦𝘰𝘯𝘦 𝘸𝘪𝘵𝘩 𝘮𝘺 𝘭𝘦𝘷𝘦𝘭 𝘰𝘧 𝘬𝘯𝘰𝘸𝘭𝘦𝘥𝘨𝘦 𝘢𝘣𝘰𝘶𝘵 𝘨𝘢𝘳𝘮𝘦𝘯𝘵𝘴.

³ 𝘙𝘰𝘣𝘣𝘦𝘥 𝘰𝘧 𝘪𝘵𝘴 𝘰𝘳𝘪𝘨𝘪𝘯𝘢𝘭 𝘮𝘦𝘢𝘯𝘪𝘯𝘨, 𝘴𝘪𝘯𝘤𝘦 𝘸𝘦 𝘢𝘳𝘦 𝘧𝘢𝘳 𝘧𝘳𝘰𝘮 𝘩𝘢𝘷𝘪𝘯𝘨 𝘤𝘰𝘯𝘴𝘤𝘪𝘰𝘶𝘴 𝘮𝘢𝘤𝘩𝘪𝘯𝘦𝘴.

Data Condensation

Some years ago I tried my hand at daytrading and more recently I had the opportunity to work with Recency Frequency Monetary models, now followed by SNMP sensor data. As it turns out, they all have something in common. They all become most valuable and interesting when you are able to discover behavior that is out of the ordinary. One can approach such detection in two ways; define abnormal and react to it or define normal and react to exceptions from it. Given that all of the mentioned subject areas are heavily skewed towards the normal, it is easier to go with the latter approach. The technique I am about to describe is influenced by Bollinger Bands, but is based on medians rather than averages, since they are less susceptible to the effects of short duration spikes.

The type of daytrading I was practicing was driven by two factors, news or indicators. The idea being that big news tend to push the market in one way or the other, but news spread asymmetrically, so there is a window of opportunity to ride the wave during the spreading if you catch it early. Big news, however, like whether to prolong quantitative easing or not, do not come on a daily basis. In order to fill the idle time, indicators can be used in a similar fashion, but on a smaller scale. The idea being that if an indicator is popular, enough trades will happen when that indicator yields a signal to cause a tradable movement. Today, this is much harder, because high frequency trading may negate an expected movement almost entirely, together with an overflow of new and exotic indicators and instruments, that obscure the view of what is popular and impair the consistency of effects. Give me any stock market chart though and I can still point out a few movements that were “not normal” in the sense that something had to drive them. A trading strategy that tries to catch abnormalities early, oblivious of the reason, may not be such a bad idea?

An RFM model consists of three attributes that are assigned to individual entities that make somewhat regular spendings. Recency indicates when the last spending was made, preferably expressed as the exact point in time when it was made. Frequency indicates the normal interval between spendings, preferably expressed as a duration in days, hours, minutes, or whatever time frame is suitable, but as precise as possible. Monetary indicates the normal size of the spending, preferably expressed as an amount in some currency, again as precise as possible. The reason the model is constructed like this is to give it predictive and indicative properties. R+F will give you the expected time of the next spending. Those who have passed that time are delayed with their spending; a good indication that they may need a reminder. Totalling F+M will give you an estimate of future revenue. Inclining or declining M may be signs of desirable and undesirable behaviour. When the distance to R is much larger than F the entity is most likely “lost”, and so on…

Large networks usually have a lot of equipment that transmit SNMP data. It may be temperature readings, battery levels, utilisation measures, congestion queues, alarms, heartbeats, and the likes. This yields a very high volume of information, and most network surveillance software only hold a very limited history of such events. They are instead rule based and react in real-time to certain events in predictable ways, such as flashing a red banner on a screen when an alarm goes off. There are two ways to deal with data that does not fit into the limited history; scrap it or store it. If you scrap it you cannot go back and analyse anything that happened outside of your window of history, which could be as short as a few days. If you store it you will need massive storages and likely even then only extend the history by a single order of magnitude. In reality though, most of your equipment is behaving normal for most of the time. What if we could decrease the granularity of the data during periods of normality and retain the details only for out of ordinary events? That could significantly reduce the amount of data needed to be stored.

If this is to be done, normal must be what we compare a current value to. A common indicator used for this purpose within daytrading is the moving average. Usually, this is average is windowed over quite a large number of measurements, such as the popular MA50 (last 50 measurements) and MA200 (last 200 measurements), which when they cross is a common trading signal. Moving averages have some downsides though and large windows do too. Let us look at a comparison of four different ways to describe normal, using MA3, MA5, MM3, and MM5, where MM are moving medians, taken on measures that alternate betweeen two values, 5 and 50, over time.

Moving Averages and Medians

Looking at point 7 in the series, both MAs are disturbed by the peak, whereas both MMs remain at the value 5. Comparing 50 to either of the MMs or MAs would likely lead you to the conclusion that 50 is out of the ordinary, but the MMs are spot on when it comes to what is normal. What is worse is when we reach point 8. Clearly 5 is normal compared to the MMs, but the disturbances of the MAs are still lingering, so it is now difficult to say whether 5 is out of the ordinary or not. Comparing MA3 to MA5, it is obvious that a larger window will reduce the disturbance, but at the cost of extending the lingering.

Moving on to point 14 and 15, two consecutive highs, the MM3 will already at point 15 see the value 50 as the new normal, whereas MM5 will stay at 5. For MMs, the window size determines how many points out of the ordinary are needed for them to become the new normal. Quoting Ian Fleming’s Goldfinger: “Once is happenstance. Twice is coincidence. Three times is enemy action”, he has obviously adopted MM5, as seen in point 24. If we considered using MAs and extending the window size, thinking that the lingering is not too high a price to pay, another issue is seen in points 24 and 26. For MA3 it takes three points to adjust to the new normal and for MA5 it takes five points. The MMs move quicker. For these reasons, MMs will be used as the basis for describing normal behaviour.

To try things out, let’s see how hard it would be to use this to condense 45 years of daily coffee prices. Coffee is one of the most volatile commodities you can trade, and there has been some significant ups and downs over the years. The data is kindly provided by MacroTrends and a graph can be seen below.

No alt text provided for this image

Condensing that will be much harder than the SNMP data, which is tremendously less volatile. Backing up a bit, the data in the graph is stored in a Microsoft SQL Server database. The table holding the data is structured as follows:

create table #timeseries (
  Classification char(2), 
  Timepoint date,
  Measure money,
  primary key (
    Classification,
    Timepoint desc
  )
);

Classification is here a two letter acronym making it possible to store more than just coffee (KC) prices. In the case of SNMP data, each device would have it’s corresponding Classification, so you can keep track of each individual time series. For a large network, there could be millions of time series to condense.

In a not so distance past a windowed function that can be used to calculate medians was added to SQL Server, the PERCENTILE_CONT. Unlike many other windowed functions, it does, however and sadly, not allow you specify a window size using ROWS/RANGE. We would want to specify such a size, such that the median is only calculated over the last N timepoints, as in MM3 and MM5 above. As it turns out, with a bit of trickery, it is possible to design your own window. This trick is actually useful for every aggregate that does not support the specification of a window size.

select distinct
  series.Classification,
  series.Timepoint,
  series.Measure,
  percentile_cont(0.5) within group (
    order by windowed_measures.Measure
  ) over (
    partition by series.Classification, series.Timepoint
  ) as MovingMedian
into 
  #timeseries_with_mm
from 
  #timeseries series
cross apply (
  select 
    Measure
  from 
    #timeseries window
  where 
    window.Classification = series.Classification
  and
    window.Timepoint <= series.Timepoint
  order by 
    Classification, Timepoint desc
  offset 0 rows
  fetch next @windowSize rows only
) windowed_measures;

Thanks to the cross apply fetching a specified number of previous rows for every Timepoint the median can be calculated as desired. If @windowSize is set to 3 we get MM3 and with 5 we get MM5. The PERCENTILE_CONT is partitioned so that we calculate the median for every Timepoint. Some rows from the #timeseries_with_mm table are shown in the table below, using MM3.

Classification Timepoint   Measure  MovingMedian
KC             1973-08-20  0,6735   0,6735
KC             1973-08-21  0,671    0,67225
KC             1973-08-22  0,658    0,671
KC             1973-08-23  0,6675   0,6675
KC             1973-08-24  0,666    0,666
KC             1973-08-27  0,659    0,666
KC             1973-08-28  0,64     0,659

Given this, comparisons can be made between a Measure and its MM3. It is possible to settle here, with some threshold for how big a difference should trigger the “out of the ordinary” detection. But, looking at the SNMP data, it’s sometimes affected by low level noise, and similarly Coffee prices have periods of higher volatility. If those, too, are normal, the detection must be fine tuned to not trigger unnecessarily often. To adjust for volatility it is possible to use the standard deviation, corresponding to the STDEVP function in SQL Server. When the volatility becomes higher the standard deviation becomes larger, so we can use this in our detection to be more lenient in periods of high volatility.

select 
  series.Classification,
  series.Timepoint,
  series.Measure,
  series.MovingMedian,
  avg(windowed_measures.MovingMedian) 
    as MovingAverageMovingMedian,
  stdevp(windowed_measures.MovingMedian) 
    as MovingDeviationMovingMedian
into
  #timeseries_with_mm_ma_md
from 
  #timeseries_with_mm series
outer apply (
  select
    MovingMedian
  from
    #timeseries_with_mm window
  where
    window.Classification = series.Classification
  and
    window.Timepoint <= series.Timepoint
  order by
    Classification, Timepoint desc
  offset 1 rows 
  fetch next @trendPoints rows only
) windowed_measures
group by
  series.Classification,
  series.Timepoint,
  series.Measure,
  series.MovingMedian;

I am going to calculate the deviation not over the Measures, but over the MovingMedian, since I want to estimate how noisy the normal is. In this case I will base it on the three previous MM3 values (offset 1 and @trendPoints = 3 above). The reason for not using the current MM3 value is that it is possibly “tainted” by having included the current Measure when it is calculated. What we want is to compare the current Measure with what was previously normal, in order to tell if it’s an outlier. At the same time, it would be nice to know if Measures are trending in some direction, so once we are at it, a moving average of the three previous MM3 values is calculated. As seen above, the window trick can be used in conjunction with GROUP BY as well.

Note that three previous MM3 values require 6 previous Measures to be fully calculated. This means that in daily operations, such as for SNMP data, at least six measures must be kept to perform all calculations, but for each device the seventh and older measures can be discarded. Provided that the older measures can be condensed, this will save a lot of space.

With the new aggregates in place, left to determine is how large the fluctuations may be, before we consider them out of the ordinary. This definitely will take some tweaking, depending on your sources producing the measures, but for the Coffee prices we will settle with the following. Anything within 3.0 standard deviations is considered a non-event. In the rare case that the standard deviation is zero, which can happen if the previous three MM3 values are all equal, we circumvent even the smallest change to trigger an event by also allowing anything within 3% of the moving average. Using these, a tolerance band is calculated and a Measure outside it is deemed out of the ordinary.

-- accept fluctuations within 3% of the average value
declare @averageComponent float = 0.03; 
-- accept fluctuations up to three standard deviations
declare @deviationComponent float = 3.0; 

select 
  Classification,
  Timepoint,
  Measure,
  Trend,
  case 
    when outlier.Trend is not null
    then (Measure - MovingMedian) / (Measure + MovingMedian)
  end as Significance,
  margin.Tolerance,
  MovingMedian
into
  Measure_Analysis
from 
  #timeseries_with_mm_ma_md
cross apply (
  values (
    @averageComponent * MovingAverageMovingMedian + 
    @deviationComponent * MovingDeviationMovingMedian
  )
) margin (Tolerance)
cross apply (
  values (
    case 
      when Measure < MovingMedian - margin.Tolerance then '-'
      when Measure > MovingMedian + margin.Tolerance then '+'
    end 
  )
) outlier (Trend)
order by
  Classification, 
  Timepoint desc;

If the Measure is larger, the trend is positive and negative if it is lower. Events that are deemed out of the ordinary may be so by a small amount or by a large amount. To determine the magnitude of an event, we will use the CHOAS metric. It provides us with a number that becomes larger (positive or negative) as the difference between the Measure and the MovingMedian grows.

Finally, keep the rows that are now marked as outliers (Trend is positive or negative) along with the previous row and following row. The idea is to increase the resolution/granularity around these points, and skip the periods of normality, replacing these with inbound and outbound values.

select
  Classification,
  Timepoint,
  Measure,
  Trend,
  Significance
into
  Measure_Condensed
from (
  select 
    trending_and_following_rows.Classification, 
    trending_and_following_rows.Timepoint, 
    trending_and_following_rows.Measure,
    trending_and_following_rows.Trend,
    trending_and_following_rows.Significance
  from 
    Measure_Analysis analysis
  cross apply (
    select 
      Classification, 
      Timepoint, 
      Measure,
      Trend,
      Significance
    from 
      Measure_Analysis window
    where
      window.Classification = analysis.Classification
    and
      window.Timepoint >= analysis.Timepoint 
    order by
      Classification,
      Timepoint asc
    offset 0 rows
    fetch next 2 rows only
  ) trending_and_following_rows
  where 
    analysis.Trend is not null
  union
  select 
    trending_and_preceding_rows.Classification, 
    trending_and_preceding_rows.Timepoint, 
    trending_and_preceding_rows.Measure,
    trending_and_preceding_rows.Trend,
    trending_and_preceding_rows.Significance
  from 
    Measure_Analysis analysis
  cross apply (
    select 
      Classification, 
      Timepoint, 
      Measure,
      Trend,
      Significance
    from 
      Measure_Analysis window
    where
      window.Classification = analysis.Classification
    and
      window.Timepoint <= analysis.Timepoint 
    order by
      Classification,
      Timepoint desc
    offset 0 rows
    fetch next 2 rows only
  ) trending_and_preceding_rows
  where 
    analysis.Trend is not null
  union
  select
    analysis.Classification,
    analysis.Timepoint,
    analysis.Measure,
    analysis.Trend,
    analysis.Significance
  from (
    select
      Classification,
      min(Timepoint) as FirstTimepoint,
      max(Timepoint) as LastTimepoint
    from
      Measure_Analysis
    group by
      Classification 
  ) first_and_last
  join
    Measure_Analysis analysis
  on
    analysis.Classification = first_and_last.Classification
  and
    analysis.Timepoint in (
      first_and_last.FirstTimepoint, 
      first_and_last.LastTimepoint
    )
) condensed;

The code needs to manage the first and last row in the timeseries, which may not be trending in either direction, but needs to be present in order to produce a nice graph. This will reduce the Coffee prices table from 11 491 to 884 rows. That this was harder than for SNMP data is shown by the “compression ratio”, which in this case is approximately 1:10, but for SNMP reached 1:1000. The condensed graph can be seen below.

No alt text provided for this image

Colors are deeper red for negative Significance and deeper green for positive Significance. What is interesting is that Coffee seems to have periods that are uneventful and other periods that are much more eventful. These periods last years. Of course, trading is more fun when the commodity is eventful, and unfortunately it seems as if we are in an uneventful period right now.

In this article, code has been optimized for readability and not for performance. Coffee may not have been the best example from a condensability perspective, but it has some interesting characteristics and its price history is freely available. There are surely other ways to do this and the method presented here can likely be improved, so I would be very happy to receive comments along those lines.

The complete code can be found by clicking here.

Appearance is Everything

In my previous article “What needs to be agreed upon“, from my series about #transitional modeling, I listed the few things that must be interpreted equally among those sharing information between them. To recall, these were identities, values, roles, and time points. If we do not agree upon these, ambiguities arise, and it is no longer certain that we are talking about the same thing. We used this to create the fundamental construct in transitional modeling; the posit, which is a “triple” on the form [{(id¹, role¹), …, (idᴺ, roleᴺ)}, value, time point]. The set in the first position is called a dereferencing set, and each ordered pair in such a set is called an appearance. An appearance consists of an identity and a role and they will be the topic of this article.

What is interesting and different from most other modeling techniques, is that what the identities represent may be subject to discussion. Two individuals exchanging information in transitional form may disagree on the classifications of the things they discuss. It does not matter if the identity 42 is thought of as a ‘Living Thing’ by one, a ‘Human’ by another, a ‘Person’ by a third, a ‘Customer’ by a fourth, a ‘Fashionista’ by a fifth, an ‘Animate Object’ by a sixth, a ‘Transaction Agent’ by a seventh, and so on. Classifications are just subjective labels in transitional modeling. The glaring issue here is that almost every other modeling technique use class diagrams in the modeling process, but a class diagram presumes that classification is objective. That each and every individual that will ever gaze upon the diagram is in complete agreement that the things it model follow that particular scheme.

Since classification is subjective in transitional modeling, its modeling process must start elsewhere. We need to toss class diagrams in the bin. That is painful for someone who has spent the better part of his life drawing and implementing such diagrams, or in other words, me. With classes in the bin, what remains that can be visualized? To begin with, it must be among the concepts that needs to be agreed upon, that which is objective. Those are the four previously mentioned; identities, values, roles, and time points. Let us look at some posits about a few things, and as it turns out, through these things we shall learn.

  • [{(42beard color)}, black, 2001-01-01]
  • [{(42hair color)}, black, 2001-01-01]
  • [{(42height)}, 187cm, 2019-08-20 08:00]
  • [{(42social security number), OU812-U4IA-1337, 1972-08-20]
  • [{(42name)}, Lazarus, 1972-09-21]
  • [{(42owner), (555pet), currently owning, 2017-08-10]
  • [{(555name)}, Jacqueline, 2017-06-07]
  • [{(555hair color)}, brown, 2017-06-07]
  • [{(555RFID)}, 4F422275334835423F532C35, 2017-06-07]

I am sure your imagination has already filled in a few blanks, given the posits above. We will try to disregard that. In order to produce a visualization, we first need to find something common between these posits. Looking closer at the appearances, some of the roles (in bold) appear more than once. Let us write down the roles, and in the case of relational posits, the combination of roles separated by commas.

Roles depicted in a diagram.

Since what roles mean, the semantics, is necessarily objective in transitional modeling, they make for a good start in a diagram. The diagram above tells us that the things we will be modeling may appear together with these roles. The meaning of each role should be detailed in some other documentation, unless entirely obvious. To me, RFID is something I have a shallow understanding of, so I had to look up the number format online, for example. For our purposes, it’s sufficient to know that it may act as an identifier and can be biologically implanted and later scanned.

So far, the diagram does not give us anything with respect to how these roles appear for our things. Looking back at the list of posits, we can also see that identities appear more than once as well. They will therefore be our second candidate to diagram. We could put the numbers 42 and 555 (actual identities) in the diagram and connect the numbers to the roles in which they appear. This approach, however, only works when you have a very limited number of identities. The diagram, although very expressive, will quickly turn into a confusing snarl, completely defeating its purpose. Since this approach breaks down for a large number of posits, let’s assume that we have a few thousand posits similar to the ones above, in which many different identities appear.

Rather than to diagram the individual identities, what if we could use some aggregate? Let us try with a simple count of unique identities. The count could be written down next to each role. Let’s say that name appears with 5800 unique identities, hair color with 5800, height with 5000, social security number with 5000, beard color with 1450, and RFID with 750. We are getting somewhere now, but there are still redundancies. The count is the same for some of the roles, so why should we write them down more than once? This is where it struck me that these counts behave somewhat like altitudes and that the roles we’ve written down behave somewhat like geographical regions. But, if that is the case, then there is already a type of diagram suitable to display such information; a contour map.

Isopleths for counts of identities in roles.

This diagram uses isopleths to create areas with the same counts. From this it is easy to see that more things appear with a name and hair color than things that appear with height and a social security number. All things that appear with the latter two roles also appear with the former two, though. We can also immediately tell that no things have both an RFID and a social security number. The observant reader will have noticed that the relational posit with the combination of the roles {owner, pet} was left out of the counting in the previous step. This was a deliberate act, since I want to make its description richer. The reasoning being that cardinality could be estimated if both actual counts and unique counts are depicted. Please note that even if this particular posit represents a binary relationship, transitional modeling is in no way limited and may have arbitrarily many roles in a single relationship. For such, every cardinality constraint can be expressed, subjectively.

The {ownerpet} role combo does not lend itself very well to the altitude analogy. The additional isopleths would cut through its midst and confuse, rather than enlighten, the viewer. They rather say something about the isopleths themselves, and how these relate to each other. In a more traditional fashion, these will be drawn using lines, connecting the individual roles in the combo with a number of isopleths. From the diagram, we can now see that 600 of the 750 unique things appear in the pet role. The total count, in parentheses, is also 600, so they must appear exactly once. The owner role is different. There are 100 bearded owners, that own a total of 170 pets, and 400 lacking a beard, owning 430 pets. In other words, some owners own more than one pet.

With this knowledge in place, me, subjectively is starting to think about what these things actually are. It is also somewhat obvious where the boundaries between the classifications are. This is the act of classification, finding the boundaries between similar and dissimilar things. I may then proceed to define two classes; Person and Animal, as depicted below.

One subjective classification given what we know.

Both classes have name and hair color attributes, but there are also attributes unique to the classes, such as RFID for Animal. It is important to remember that this is my classification. Renaming the classes to Insurer and Insured will not change the things themselves in any way, and it is an equally valid and simultaneously possible classification. Changing the classes (and the coloring) to Chipped and Unchipped, depending on whether an RFID tag had been implanted or not, is also equally valid. However, a classification in which the coloring would no longer be contained by isopleths is not valid. For example, a Male and Female classification is invalid. Why? Unless all males have beards, the color for Malewould have to be present in both the 5000 and 1450 count isopleths, thereby breaking the rule I just instated. The reason is that classification is not unrelated to the information at hand. If another attribute, gender, is added, the necessary isopleths will form, and such a classification will become possible. In other words, classifications may not encode information that is not already present in the model.

While the contour map is objective, its coloring according to classifications is not. So, if and when a modeling tool is built for transitional modeling, it needs to have a way to select a particular subjective perspective to see its classification. It doesn’t stop there though. It would also need a slider, so you could view the isopleths at different times, to see how they have evolved over time. Every posit may also not be asserted by everyone, so again the subjective perspective may need to come into play earlier. It may also be the case that some posits are vaguely asserted, so perhaps yet another slider is needed, to set the minimum reliability to show. Given that this still is early days for transitional modeling, this seems to be a powerful way to achieve a modeling process for it. Nothing is written in stone, and there may be counterexamples where this type of diagram breaks down. I’m very happy to hear your thoughts on the matter!

The idea to use isopleths came from this sketch made by Christian Kaul.

Rethinking the Database

This is the final article in the series “What needs to be agreed upon”“What can be disagreed upon”“What will change and what will remain”, and “What we are”. The series has established the fundamental concepts in #transitional modeling, a theoretical framework for representing the subjectivity, uncertainty, and temporality of information. This is analog to the previously published paper “Modeling Conflicting, Unreliable, and Varying Information”, but here with the assertion converted to a meta-posit. I will now be so bold as to state that all information is subjective, uncertain and temporal in nature.

Having worked with Anchor modeling for 15 years, it had evolved to the point where the old formalization from the paper “Anchor modeling — Agile information modeling in evolving data environments” was no longer valid. I had also come to the point where I started to doubt the relational model as the best way to represent Anchor. It felt as I was working against relational rather than with it as more features were added. A working theory of the beautiful constructs posits and assertions had already been formulated, albeit under other names (attributes and timeline annexes) back in 2012, “Anchor Modeling with Bitemporal Data”. Thanks to these, I had started to think about what a database engine built around those concepts could do.

During the same period, NoSQL has seen its rise and fall, but it wouldn’t have rose at all if there wasn’t some circumstances in which SQL databases did not suffice. I believe it had to do with conformance. In order to get data into an SQL database it has to conform to a table, conform to a candidate key, conform to data types, conform to constraints, conform to rules of integration, conform to being truthful, conform to be free of errors, and conform to last. With this in place, data falls into three categories; non-conforming data that cannot be made to conform, non-conforming data that can be made to conform, and conformingdata. From my own experience, almost all data I was working with fell into the first two categories. If it cannot conform, simply discard, BLOB, or in rare cases, find a fitting data type, such as JSON or XML. If it can be made to conform, write complex logic that molds the data until it fits. If it directly conforms, do a reality check or accept that you have a JBOT-style database.

Here, NoSQL flourished in comparison, with practically zero conformance demands. Just dump whatever into the database. For someone who is spending most of their time writing complex logic that molds the data until it fits, this sounds extraordinarily attractive. The issue here, as it turned out, is that what is no longer your problem suddenly became someone else’s problem. The funny thing is, that someone else didn’t even have a job description at the time, which is why it has taken far too long to realize that “inconsistent conformance on every read” is not such a nifty paradigm. However, we also want to leave the “perfectly consistent conformance on a single write” paradigm behind us.

We are currently at a point where we’ve visited two extremes of a scale on how to conform information in order to store it; totally and not at all. With that in mind, it’s not that difficult to figure out a possible way forward. It has to be somewhere in between the two. I am not the only one who have thought of this. There is currently a plethora of database technologies out there, positioning themselves on this scale. To name a few, there are graph databases, triple stores, semantic fabrics, and the likes. In my opinion, all of these still impose too much conformance in order to store information. This is where I see a place for a transitional database, aiming to minimize conformance requirements, but still provide the mechanics for schemas, constraints, and classifications on write. Different from the others, these are subjective, evolving and possibly late-arriving schemas, constraints and classifications. Similar to “eventual consistency” in a blockchain, a transitional database has “eventual conformance”.

Let’s assume that we have access to a transitional database, built upon posits at its core. What type of queries could we expect to run?

  • Search anywhere for the unique identifier 42, NVP-like search.
  • Search for everything that has the girlfriend role, Graph-like search. 
  • Search for every time 42 was a girlfriend, Graph-like search. 
  • Search for everything nicknamed ‘Jen’, Relational-like search. 
  • Search for all Persons, Relational-like search.
  • Search for all subclasses of Person, Hierarchical-like search.
  • Search as it was on a given date, Temporal-like search. 
  • Search given what we knew on a given date, Bi-Temporal-like search. 
  • Search for disagreements between 42 and 43, Multi-tenant-like search. 
  • Search that which is at least 75% certain, Probabalistic-like search. 
  • Search for corrections made between two dates, Audit-like search. 
  • Search for all model changes made by Jen, Log-like search.
  • Search for how many times consensus has been reached, new feature. 
  • Search for how many times opposite opinions have been expressed, new feature. 
  • Search for individuals that have contradicted themselves, new feature.
  • Search for when a constraint was in place, new feature.

That sure seems like a handy database, given the things it can answer. It’s a shame that it does not yet exist. Or does it? As it happens I am working on precisely such a database, written in the Rust programming language. My goal is to release a working prototype as Open Source by the end of the summer. After that I will need help, so start polishing your Rust now!

What we are

This is a continuation of the articles “What needs to be agreed upon”“What can be disagreed upon”, and “What will change and what will remain”. So far we have learnt to build posits and assert these, in order to create an exhaustive transcript of a discussion. Furthermore, the transcript came alive as a stone tablet, onto which we continuously and non-destructively can capture changes of values or opinions, following the three individuals involved. We also started to glimpse the power of #transitional modeling as a formal framework.

The last article ended with the posit P4 as ({(S44, nickname)}, Jen, 1988). We had already seen a similar posit P2 as ({(J42, nickname)}, Jen, 1988). The way the story had been told, we have presumed that J42 is a female human being. Presumptions only lead to headaches. If not now, then somewhere down the road. The transcript is clearly not exhaustive enough, so we need to rectify this, and at the same time solve the mystery of identity S44.

As it turns out, an utterance about an unrelated topic was made in the discussion. Someone said “Haha, but I wonder what Jen (J42) feels about being hit by Jen (S44)? That storm is about to hit shores tomorrow.” Aha, there’s the person J42 and the storm S44, both nicknamed Jen. In order to tell what things are, we again need to reserve some roles. Let’s reserve the strings ‘thing’ and ‘class’. We can now create two new posits ({(J42, thing), (C1, class)}, active, 1980-02-13) and ({(S44, thing), (C2, class)}, active, 2019-08-10). These connect things to classes, but the classes themselves also need to be explained.

The classes C1 and C2 can be dressed up with a lof of their own information, but let us stay with the basics and only introduce two more posits ({(C1, named)}, Female Child, 2019-08-20) and ({(C2, named)}, Storm, 2019-08-20). But wait, Jen is not a child any longer. Let’s also add ({(C3, named)}, Female Adult, 2019-08-20). If we assume that you become an adult at the age of 18, then ({(J42, thing), (C3, class)}, active, 1998-02-13). The same dereferencing set, but with a different value and a later timepoint, in other words a change. Things may change class as time goes by.

The third party reading the transcript is not much for specifics. Generics are much better. Let’s help her out and add ({(C4, named)}, Person, 2019-08-20) along with ({(J42, thing), (C4, class)}, active, 1980-02-13). The third party can assert these, simultaneously as Jennifer herself asserts the other. There is a difference of opinion, leading to concurrent models, both equally valid. Thinking about it some more, it turns out that these particular classes can actually be related ({(C1, subclass), (C4, superclass)}, active, 2019-08-20) and ({(C3, subclass), (C4, superclass)}, active, 2019-08-20). Both female children and female adults are persons.

Now that we’ve seen some of what #transitional modeling can do, it is still only a theoretical framework. What if there was a database built using posits at its core? This is the topic of the next article, entitled “Rethinking the database”.

What will change and what will remain

This is a continuation of the two articles “What needs to be agreed upon” and “What can be disagreed upon”, in which two people have a discussion, after which a third party is invited to interpret a transcription of it. We have concluded that if the meaning of a posit is universally agreed upon, anyone is nevertheless at liberty to disagree or express doubt towards what it is saying. Opinions about posits are recorded in the transcript itself using assertions, a kind of meta-posit that assigns someone’s confidence level with respect to a posit.

Change is everywhere, and the last article concluded that both the circumstances that posits and assertions describe may change over time. Values change and opinions change. Let us make the transcript a living document, required to capture such changes. Additionally, the transcript much be historically complete, capturing the changes in a non-destructive manner. Anything that goes into the transcript is written in stone.

Grab your chisel, because Jennifer broke up with her boyfriend. Recall that the posit P1 is ({(J42, girlfriend), (B43, boyfriend)}, official, 2019), where J42 is Jennifer and B43 her boyfriend. The posit P3 tells us what happened in 2020 and it looks like this ({(J42, girlfriend), (B43, boyfriend)}, broken up, 2020). Remember the posit P2? It is ({(J42, nickname)}, Jen, 1988). Clearly, they are all different posits, P1 ≠ P2 ≠ P3, but P1 and P3 must share something in order for them to be describing a change that they do not share with P2.

It is actually possible to precisely define change. When two posits share the same set in their first position, but have different values and one time point follows the other, they describe a change. With that in place, P3 is obviously a change from P1. Since the set in P2 differs from that in P1 and P2, it is not a change of either P1 or P3. In #transitional modeling, the set is called a dereferencing set. They remain, indefinitely, while their surroundings may change entirely. Even after J42 is gone, the dereferencing sets in which that identity is found will remain, because we can, of course, have a recollection of things that are no more.

Then how does change affect assertions? Since assertions are posits themselves it works in exactly the same way. Jennifer made the following assertion on the 5th of April in 2019 ({(P1, posit), (J42, determines confidence)}, 0.8, 2019-04-05), stating that it is very likely that she and her boyfriend officially has been an item since 2019. After learning that her “boyfriend” thought otherwise, she changed her mind. Let’s say that they had a serious talk about this on the 21st of September 2019. Jennifer’s revised confidence in P1 can then be expressed through another assertion ({(P1, posit), (J42, determines confidence)}, 0, 2019-09-21). This assertion changes her previous confidence level from 0.8 to 0, so after the 21st she has no clue if they actually were an item or not.

Incidentally, this is how a ‘logical delete’ works in a bitemporal database. Albeit, in those databases there are only two confidence values, 1 (recorded) and 0 (deleted). The database itself is also the only one allowed to have an opinion. In other words, the functionality of a bitemporal database can be described as a small subset of #transitional modeling. When confidences are extended to the continuous interval [0, 1], the functionality approaches that of probabalistic databases. If further extended to include negative confidence, [-1, 1], we approach uncertainty theory, yet to be explored in databases. The fact that anyone may have an opinion is similar to multi-tenant databases. Transitional modeling as a formal base is very powerful, despite it’s simple construction.

Back in the shoes of the third party reading the transcript, we find the posit P4 as ({(S44, nickname)}, Jen, 1988). So, wait, what? There were in fact two different Jens after all, J42 and S44? What exactly is the thing with identity S44? This will be the topic of the next article, entitled “What we are”.

What can be disagreed upon

This is a continuation of the article entitled “What must be agreed upon”, in which two individuals have a discussion, whereafter a transcript appears that a third party is invited to interpret. The transcript consists of a number of posits, for example ({(J42, girlfriend), (B43, boyfriend)}, official, 2019) and ({(J42, nickname)}, Jen, 1988). The syntax of a posit is described as a triple, where the first position is occupied by a set of ordered pairs, each pair being an identity and a role. The set is followed by a value and time point, which both may be imprecise in nature. What must be agreed upon is the syntax of the posit and the semantics of what it expresses. That sums up the conclusions of the earlier article.

Given the title of this article, let us follow up by investigating disagreements, and here comes an important distinction. Even if you understand what a posit is saying, it doesn’t imply that you believe in what the posit is saying. Many different opinions are certainly held towards a statement such as “We are alone in the universe”. So, if we want to talk about posits in the language of #transitional modeling, the posit itself must be given an identity. To talk about ({(J42, girlfriend), (B43, boyfriend)}, official, 2019) we give it the identity P1 and ({(J42, nickname)}, Jen, 1988) will be P2. The identities make it possible to create new posits that talk about other posits; meta-posits if you like.

If posits that talk about other posits live alongside posits that talk about other things, we cannot allow for any confusion with respect to the roles. We will therefore reserve roles for our purposes, say the strings ‘posit’ and ‘determines confidence’. An assertion is a posit exemplified by ({(P1, posit), (J42, determines confidence)}, 0.8, 2019-04-05). The interpretation is that Jennifer (J42), since 2019-04-05, expresses a confidence of 0.8 with respect to if her girlfriend/boyfriend status with B43 was official since 2019 (P1). Similarly ({(P1, posit), (B43, determines confidence)}, -1, 2019-04-05). We will see that those confidence numbers reveal a big conflict!

Confidences, at lease those found in our assertions, fall within the [-1, 1] interval. The mapping between how something is expressed in natural language and the numerical confidence is fuzzy. But, let us assume that 0.8 corresponds to “very likely”. Then Jennifer is saying that it is very likely that B43 officially became her boyfriend in 2019. The twist in the plot is that the boyfriend is of a very different opinion. On the negative scale of confidences, certainty towards the opposite is expressed, and -1 is “completely certain of the opposite”. More precisely, this is equivalent to being completely certain of the complement of a value. In other words, the boyfriend is completely certain, with confidence value 1, of the posit ({(J42, girlfriend), (B43, boyfriend)}, anything but official, 2019).

Tucked in between is a confidence of 0, which we call “complete uncertainty”. Let us assume that the boyfriend is clueless, and instead asserted ({(P1, posit), (B43, determines confidence)}, 0, 2019-04-05). This is interpreted as the boyfriend having no clue whatsoever if P1 is a truthful posit or not. Perhaps memory is failing or the boyfriend chose to forget. Assertions with confidence give us a powerful machinery to express differences of opinon. To recap, confidence 1 means certain of one particular value, -1 certain it’s a value different from one particular value, and 0 that it could be any value whatsoever. Values in between express confidence to a given degree.

The first article mentioned the unlikeliness of Jennifer eternally being in a good mood. There will come a time when her mood is different. Likely when she finds out what her boyfriend is asserting. At that point, maybe the recollection of her boyfriend is better, and he changes his mind. Circumstances definitely change over time, but we haven’t yet seen change in action. This will be the topic of the next article, entitled “What will change and what will remain”.

The observant reader will notice that assertions here are slightly different from in the paper “Modeling conflicting, uncertain, and varying information”. There the assertion is an actual construct, a predicate, different from the posit. Here the assertion is a meta-posit built around a semantical reservation. The reasoning behind the different approach is that if assertions are expressed as posits, they can be talked about as well using another layer of posits; a kind of über-meta-posits.

What needs to be agreed upon

If we are to have a discussion about something there are a few things we first need to agree upon in order to communicate efficiently and intelligibly. A good way to test for what we need is to transcribe a discussion between two individuals and afterwards hand this over to a third non-participating individual, who should be able to unambiguously interpret the transcription.

Let’s say one of the individuals in the discussion frequently talks about ‘Jennifer’, while the other uses ‘Jen’. In order to avoid possible confusion, it should be agreed upon that ‘Jen’ and ‘Jennifer’ is the same person. This can be done by the introduction of a unique identifier, say ‘J42’, such that the transcript may read “I talked to Jen (J42) today.” and “Oh, how is Jennifer (J42) doing?”. We need to agree upon the identities of the things we talk about.

Assuming the respose is “Jen (J42) is doing good.” followed by “I figured as much, new boyfriend and all, so Jennifer (J42) must be good.”. In this case ‘good’ is a value that should mean the same thing to both individuals. However, ‘good’ is not scientifically precise, but there is a consensus and familiarity with the value that let’s us assume that both parties have sufficiently equal definitions to understand each other. Imprecisions that would lead to confusion could of course be sorted out in the discussion itself. If “Jen (J42) is doing so so.”, then “What do you mean ‘so so’, is something the matter?” is a natural response. We need to agree upon values, and that any imprecisions lie within the margin of error with respect to the mutual understanding of their definitions.

Now, if the transcription needs to be done efficiently, leaving out the nuances of natural language, that challenge can be used to test whether or not the two constructs above can capture the essence of any discussion. Using identities and values we can construct pairs, like (J42, Jen), (J42, Jennifer), (J42, good). Forget that you’ve heard anything and put yourself in the shoes of the third party. Apparently, something is missing. There is no way to tell what ‘Jen’, ‘Jennifer’, and ‘good’ are with respect to the thing having the identitiy J42. This can be sorted out by adding the notion of a role that an identity takes on with respect to a value. Using roles, the pairs become triples (J42, nickname, Jen), (J42, first name, Jennifer), and (J42, mood, good). We need to agree upon the roles that identities take on with respect to values.

Surely this is enough? Well, not quite, unless Jen is permanently in a good mood. The triple (J42, mood, good) is not temporally determined, so how could we tell when it applies? The third party may be reading the transcript years later, or Jen may even have broken up with her boyfriend before the discussion took place, unbeknownst to the two individuals talking about her. Using a temporal determinator, interpretable as ‘since when’, the triples can be extended to quadruples (J42, nickname, Jen, 1988), (J42, first name, Jennifer, 1980-02-13), and (J42, mood, good, 9.45 on Thursday morning the 20th of August 2019). The way the point in time is expressed seems to differ in precision in these quadruples. Actually, there is no way to be perfectly precise when it comes to expressing time, due to the nature of time and our way to measure it. It’s not known exactly when Jennifer got her nickname Jen, but it was sometime in 1988. We need to agree upon the points in time when identities took or will take on certain roles with respect to certain values, to some degree of precision.

This is almost all we need, except that the quadruple can only express properties of an individual. What about relationships? Let B43 be the identitiy of Jen’s boyfriend. We may be led to use a quadruple (J42, boyfriend, B43, 2019), but this has some issues. First, B43 is in the third position, where a value is supposed to be, not an identity. If we can overlook this, the second issue is more severe. Can we really tell if B43 is the boyfriend of J42 or if J42 is the boyfriend of B43? The way we introduced the role, as something an identity takes on with respect to a value, the latter alternative is the natural interpretation. Finally, the nail in the coffin, relationsships may involve more than two identities. Where in the quadruple would you put the third party?

The solution is to return to a triple, but where the first position contains a set of ordered pairs ({(J42, girlfriend), (B43, boyfriend)}, official, 2019). This resolves the issue of who is the boyfriend and who is the girlfriend. The second position is again a value and the third is the temporal determinator. Looking back, we can actually consolidate the quadruple used to express properties to a triple as well ({(J42, nickname)}, Jen, 1988). The only difference being that the cardinality of the set is one for properties and more than one for relationships. Triples like these are called posits in #transitional modeling.

We could leave it here and be able to represent a whole lot of information this way. But, let us return to the three individuals we have been talking about. Now that the transcript is in place, consisting of a number of posits, what if they cannot agree upon it being the single version of the truth? What if some of what was written down is not 100% certain? What if someone is of an opposite opinion? This sounds like important information, and it can actually easily be transcribed as well, but it requires another construct, the assertion. This will be the topic of the next article; “What can be disagreed upon”.

Screw the Hammer

About 500BC the greek philosopher Heraclitus said “Panta Rhei”, which translates to “everything flows”. The essence of his philosophy is that if you take a thing and examine it at two different points in time, you can always find something that changed in between the two. This thinking is at the heart of transitional modeling, in that not only does it capture the changes a thing has undergone, but also changes to the interpretation of the examination (making it bitemporal). Furthermore, while a thing is an absolute, an examination is relative to the one performing it. In transitional modeling concurrent and possibly conflicting opinions are allowed. Finally, examinations may lead to imprecise results or there may be uncertainty about the examination process. Both imprecision and uncertainty are well defined and representable concepts in transitional modeling. 

I believe that even if you are not facing requirements that fits the above, yet, you will benefit from understanding the concepts of transitional modeling. Given some screws and a hammer it is way too easy to let the tool decide the solution, which is precisely what has been going on with database modeling.

Many features of transitional modeling can be found in our concurrent-reliance-temporal Anchor modeling. Why not take it for a spin in the online modeling tool?

http://anchormodeling.com/modeler/latest