Rethinking the Database

This is the final article in the series “What needs to be agreed upon”, “What can be disagreed upon”, “What will change and what will remain”, and “What we are”. The series has established the fundamental concepts in #transitional modeling, a theoretical framework for representing the subjectivity, uncertainty, and temporality of information. This is analog to the previously published paper “Modeling Conflicting, Unreliable, and Varying Information”, but here with the assertion converted to a meta-posit. I will now be so bold as to state that all information is subjective, uncertain and temporal in nature.

Having worked with Anchor modeling for 15 years, it had evolved to the point where the old formalization from the paper “Anchor modeling — Agile information modeling in evolving data environments” was no longer valid. I had also come to the point where I started to doubt the relational model as the best way to represent Anchor. It felt as I was working against relational rather than with it as more features were added. A working theory of the beautiful constructs posits and assertions had already been formulated, albeit under other names (attributes and timeline annexes) back in 2012, “Anchor Modeling with Bitemporal Data”. Thanks to these, I had started to think about what a database engine built around those concepts could do.

During the same period, NoSQL has seen its rise and fall, but it wouldn’t have rose at all if there wasn’t some circumstances in which SQL databases did not suffice. I believe it had to do with conformance. In order to get data into an SQL database it has to conform to a table, conform to a candidate key, conform to data types, conform to constraints, conform to rules of integration, conform to being truthful, conform to be free of errors, and conform to last. With this in place, data falls into three categories; non-conforming data that cannot be made to conform, non-conforming data that can be made to conform, and conformingdata. From my own experience, almost all data I was working with fell into the first two categories. If it cannot conform, simply discard, BLOB, or in rare cases, find a fitting data type, such as JSON or XML. If it can be made to conform, write complex logic that molds the data until it fits. If it directly conforms, do a reality check or accept that you have a JBOT-style database.

Here, NoSQL flourished in comparison, with practically zero conformance demands. Just dump whatever into the database. For someone who is spending most of their time writing complex logic that molds the data until it fits, this sounds extraordinarily attractive. The issue here, as it turned out, is that what is no longer your problem suddenly became someone else’s problem. The funny thing is, that someone else didn’t even have a job description at the time, which is why it has taken far too long to realize that “inconsistent conformance on every read” is not such a nifty paradigm. However, we also want to leave the “perfectly consistent conformance on a single write” paradigm behind us.

We are currently at a point where we’ve visited two extremes of a scale on how to conform information in order to store it; totally and not at all. With that in mind, it’s not that difficult to figure out a possible way forward. It has to be somewhere in between the two. I am not the only one who have thought of this. There is currently a plethora of database technologies out there, positioning themselves on this scale. To name a few, there are graph databases, triple stores, semantic fabrics, and the likes. In my opinion, all of these still impose too much conformance in order to store information. This is where I see a place for a transitional database, aiming to minimize conformance requirements, but still provide the mechanics for schemas, constraints, and classifications on write. Different from the others, these are subjective, evolving and possibly late-arriving schemas, constraints and classifications. Similar to “eventual consistency” in a blockchain, a transitional database has “eventual conformance”.

Let’s assume that we have access to a transitional database, built upon posits at its core. What type of queries could we expect to run?

Search anywhere for the unique identifier 42, NVP-like search.
Search for everything that has the girlfriend role, Graph-like search.
Search for every time 42 was a girlfriend, Graph-like search.
Search for everything nicknamed ‘Jen’, Relational-like search.
Search for all Persons, Relational-like search.
Search for all subclasses of Person, Hierarchical-like search.
Search as it was on a given date, Temporal-like search.
Search given what we knew on a given date, Bi-Temporal-like search.
Search for disagreements between 42 and 43, Multi-tenant-like search.
Search that which is at least 75% certain, Probabalistic-like search.
Search for corrections made between two dates, Audit-like search.
Search for all model changes made by Jen, Log-like search.
Search for how many times consensus has been reached, new feature.
Search for how many times opposite opinions have been expressed, new feature.
Search for individuals that have contradicted themselves, new feature.
Search for when a constraint was in place, new feature.

That sure seems like a handy database, given the things it can answer. It’s a shame that it does not yet exist. Or does it? As it happens I am working on precisely such a database, written in the Rust programming language. My goal is to release a working prototype as Open Source by the end of the summer. After that I will need help, so start polishing your Rust now!

Spread the love

Published by

Lars Rönnbäck

One thought on “Rethinking the Database”

Leave a Reply Cancel reply