First Ever Certified Anchor Modelers

The first ever class of Certified Anchor Modelers graduated last week! It was a great three days of classes with engaged and skillful students, whose intellects were put to the test when it came to managing the three dimensions of time travel possible with Anchor. Traveling over changing time and positing time for different positors using concurrent-reliance temporal modeling proved to be no problem for the students though, and everyone passed the certification in flying colors. Congratulations to you!

We also added a Directory of Certified Anchor Modelers which can be found by clicking here. Let us know if you want/need any of the information in the directory updated.

Here are a couple of photos from the class and the surprise dinner cruise we went on:
IMG_1642IMG_1638

Anchor Modeling Certification Course

We are proud to announce the first public Anchor Modeling Certification Course, to be held in Stockholm on February 5th to 7th. Please see the details below for how to sign up for the course and following certification. The course is held in central Stockholm, and we can help recommend accommodation for foreign participants.

Instructor: Lars Rönnbäck

Duration: 2 days of lectures and workshops followed by ½ day of assisted examination.

Cost: €2500 + VAT

Location: Stockholm, Sweden, hosted by UpToChange (www.uptochange.com).

Date: February 5th to 7th, 2014. There are still a few seats available, so we are extending registration until February 3rd.

Contact: sales@uptochange.com

Edit: Some have reported trouble reaching this mail address. If so, you can use lars.ronnback@anchormodeling.com or lars.ronnback@gmail.com instead.

Target Audience: Data Modelers, Enterprise Architects, Data Warehouse Architects, Business Intelligence Architects. A prerequisite for the class is to have a general familiarity with and understanding of database modeling concepts.

Certification Validity: Your certification will get a version number, specifying which version of the certification you took. A certification of a certain version is valid for life. If and when enough things have changed in Anchor Modeling to warrant a new version of the certification, the version number will be increased and people can decide if they want to recertify for the new version or not.

Course Description: The course will cover both uni-temporal (the original) and concurrent-reliance-temporal Anchor Modeling from theoretical and practical points of view, with students building their own models using our modeling tool as exercises. After the course and certification you will be equipped with the knowledge necessary to architect your own solutions using Anchor Modeling.

The course will introduce bitemporal concepts used to handle versioning and corrections of information, while retaining a complete history of such changes. The reliability of information and how such can be used to solve otherwise difficult modeling challenges is discussed, and introduced in one of the exercises. Storing concurrent and opposing views of the same information but with different origins is also handled. The course will also demonstrate how “time traveling” is possible through the use of parametrized views, making it possible to ask complex temporal questions through simple SQL, and how these queries gain excellent performance thanks to modern query optimization techniques. Other advanced topics in Anchor Modeling, such as evolving models over time, design patterns, refactoring, and management of unknown values are handled in the course.

Background: Anchor Modeling is an agile modeling technique particularly suited for data environments that change over time. It has been developed in a collaboration between the Swedish industry and the Dept. of Computer Science at Stockholm University. With close to ten years since the first implementations, along with awarded Open Source tools and scientific papers, the technique is now gaining world wide momentum. Anchor Modeling can be implemented in traditional relational databases and it is based on current research in entity relationship modeling, database normalization and temporal databases.

About the instructor: Lars Rönnbäck is one of the authors of the Anchor Modeling technique. He is working as a senior consultant at UpToChange and is a research affiliate with Stockholm University. He has been working with some of the largest companies in the Swedish insurance and retail businesses, is an avid presenter at conferences, and has more than ten years experience from the field of Business Intelligence.

The Iceberg Exercise

We have developed a new exercise that will be used in our Anchor Modeling classes. It has been designed to show many aspects of concurrent-reliance-temporal modeling. The idea is to model a database for “An Iceberg Tracking and Drift Prediction System“. The document can also be found under the Publications menu. The entirely fictional system has the following objectives and goals.

Objective
To track and predict the movements of icebergs in order to avoid collisions with vessels
traveling on or below the oceanic surface, and allow for better route planning in the
presence of icebergs.

Goals
Hourly track the position of every iceberg larger than 1 square mile and accurately
predict their positions in 1, 3, 5, 10, 15, and 30 days. Areas free from icebergs are also
reasonably well predicted up to half a year in advance. This data should be made
freely available on the Internet.

v0.97 updated in test

We have updated the test version to revision 480, in which you can begin to test the concurrent-reliance-temporal implementation for attributes. Ties will follow in a later test version. (Update October 9th: we are almost there now!). The following fixes and additional features have been made or added since the last test version (r453).

  • r480
    Corrected information in about.html. Now pushing to test.
  • r479
    Adding tie insert and update triggers. Fixing some issues with the generated CRT SQL. Really fixed the zeroing velocity bug this time. Pushing this to test.
  • r478
    Zeroing velocity when stopping nodes, yielding a slight performance tweak and better layout.
  • r477
    Stabilizing generated CRT code. Now executes without error. Triggers on ties are still not done.
  • r476
    Continuing work on CRT sisula templates. Tie rewinders done. Perspectives started.
  • r475
    Continuing work on CRT.
  • r474
    Added tie restatement constraints.
  • r473
    Fixed SQL generation bugs. CRT code now executed without errors.
  • r472
    Continuing work on CRT.
  • r471
    Starting work on triggers in CRT. Broken generation now.
  • r470
    Better index alignment.
  • r469
    Updated about.html. Pushing to test (again).
  • r468
    Added default positor (=0) and _Positor table for holding all positors.
    Rewrote latest perspectives to become views again, using the positors table to
    determine which positors are available.
    Changed partitioning to be based on positors, and made it work.
    Cleaned up sisula files and moved defaultCapsule to NamingConvention.js.
    Fixed bug that only made unix files work in sisula, due to CRLF handling.
    Fixed bug in partitioning.
  • r467
    Updated about.html with revision number. Pushing to test.
  • r466
    Table elimination is now in effect for CRT, but still untested whether results are the same as with the previous logic. If no temporalization is present in the XML, default will now be whatever is specified in the Defaults.
  • r465
    Added (generated) date and time metadata attributes to the XML.
  • r464
    Fixed the bug with Defaults overwriting settings for ties and anchors as well.
  • r463
    Fixed bug where Default setting of restatements and idempotency would overwrite anything explicitly specified in a loaded XML file.
    Pushing this release to test, since this bug was affecting some users.
  • r462
    Trying various approaches for table elimination.
  • r461
    Added a GenerateDropScript procedure in the Schema Tracking sisula. Trying to get table elimination for CRT to work. Changed granularity of the changing time in the example model.
  • r460
    Fixed bug in key generation SP (only one key could be generated). Added extra null checking and defaults in the insert trigger. Continuing work on CRT.
  • r459
    Continuing work on Concurrent-Reliance-Temporal (CRT) Modeling. Reverted the temporal PK change. Added attribute rewinders. Changed the restatement constraint to make use of rewinders.
  • r458
    Added restatement constraint for attributes. Fixed temporal primary key (entity integrity).
  • r457
    Added assembled views for attributes in the CRT generation.
  • r456
    Starting work on concurrent-reliance-temporal modeling. Knot, anchor, and attribute tables are created correctly.
  • r455
    Forgot to change VERSION to 0.97.
  • r454
    Fixed typo in transformToFragment.

Help us improve SQL Server

While working with SQL Server, we have identified a few shortcomings that we have reported to Microsoft. To raise awareness for these feature request we would appreciate your help in voting for their importance. In order to do so, follow the links below, log in with your Microsoft ID and click on the green box with an up arrow.

Introduce Substitute Primary Keys in the Optimization Phase

When parts of a primary key is used in a calculation, the result of that calculation may be used instead of a column that is part of the primary key, and still fulfill all the requirements of being a primary key. For example, if the primary key in a table is (id, stamp) and a view is created as select id, stamp, row_number() over (partition by id order by stamp desc) as ver… then (id, ver) can be used as a substitute primary key for (id, stamp). The query optimizer could use such information in order to do table (join) elimination in the case that the view is joined on id, ver instead of id, stamp.

Consolidate Execution Plan Paths

When the execution plan contains multiple paths with identical sections, these could be consolidated and only executed once, instead of once per path. See the example below where three paths could be consolidated, with a substantial gain in performance.

Elimination of Multistatement Table-valued Functions from Execution Plan

In certain circumstances, when a primary key is defined in the table returned from a Multistatement Table-valued Function, the table does not have to be materialized at all, since it does not take part in the execution. This is similar to the already existing table elimination done between tables that have primary keys (and foreign keys) defined. Please see the attached script for a scenario in which the table is “unnecessarily” materialized.

Creating/altering triggers is using huge amounts of RAM

There seems to be some sort of issue with trigger logic using up massive amounts of RAM when they are created or altered. The following large DDL-script taken from a DW used in production uses more than 20GB of RAM when executed, and takes minutes to run, even though it only contains DDL statements. If you run it on a server with too little RAM it will result in an out-of-memory error. The expected behavior would be that it would run in a very short time and not consume huge amounts of RAM. The example contains other DDL statements as well, but the issue clearly lies with the triggers, since if they are omitted the problems with the script disappear.

Example here (note that the file is 3.5MB): https://www.anchormodeling.com/transitional/JiraDW.sql

Also, if you can figure out workarounds, let us know. Thanks!

Examples have been added

We have added a new section to the site called ‘Examples’ in which you can find an example database and example loading scripts. With this we want to show different ways to load data into an anchor model. The scripts can be used as templates for making your own loading logic, and as the example models covers most different types of modeling patterns, it should be rather complete. Find the pattern in the example that matches your situation and copy and modify those scripts.

We will add more examples in the near future to make this collection as complete as possible.

Click here to go to the examples.

v0.97 ready for testing

Version 0.97 of the online modeler has just been released for testing. It can be found by clicking here. This is the first release that uses the new SQL code generation engine based on sisula. We have currently only migrated the code for what was previously named ‘monotemporal’ implementations. These are now named ‘unitemporal’ instead, since the prefix “mono” could have negative connotations.

The primary reason for switching code generation engine was that the XSLT was getting increasingly difficult to maintain. Furthermore, as more people have shown an interest in helping out with adding support for other databases, we also wanted to simplify this process. In all honesty, there are very few developers out there who are well versed in both XSLT and SQL. So, it was a bad design choice from the start. With sisula instead, the code looks a lot like the end result and a minimum of coding experience is needed on top of knowing SQL.

Please note that code generation has been completely rewritten, and because of that, many bugs may have been introduced. This version really is for testing only! While the ‘unitemporal’ generation is being tested, we are continuing work on the ‘concurrent-reliance-temporal’ implementation. We also hope to start work on supporting other databases very soon.

Introducing sisula

What in the world is sisula? Sisula stands for ‘simple substitution language’ and it is the key to supporting other databases in our modeling tool. Up until now, the tool was using XSLT to transform XML to SQL. But as it turns out, there are very few people who are fluent in both languages. In that respect, using XSLT was a bad design choice. It also becomes verbose, to say the least, and the code looks quite far from the end result.

So, what we are doing right now is moving towards a template language, in which you write SQL code with some variable names in it. It looks very much like the end result, meaning you only need to have very basic programming skills on top of your SQL expertise in order to write templates. We believe that we will get quicker adoption for other databases using this.

Sisulas are actually a family of languages, and we created a flavor suitable for producing SQL, in which the template code is substituted to JavaScript and evaluated to get the end result. You could say that a sisula is a language in which you introduce a simplification or notation that gets substituted to another language using regular expression replacements. Below is what sisula code looks like, with the template parts delimited by tilde signs.

We also wanted a simple way to access the different parts of a model in sisula. Instead of using the XML, and access it using either XPath notation or through DOM traversal, we decided to expose a JSON-compatible object. That way you can use regular object notation to access the parts, which is more familiar and natural. Below is the JSON description of the schema object used in the template above.

Finally, keeping all code together in a single file, as it was previously, turns out to be somewhat limiting as well. By instead having small templates that encapsulate specific pieces of SQL, it makes it much easier for anyone to insert custom templates in between existing ones, or to leave out certain templates altogether. The example above is contained in its own file: CreateKnots.js. Templates are referenced from a directive, that keeps the process together and lists all the template files that should be processed and in which order.

Our Sisulator (some 100 lines of JavaScript code) does all the work, first converting the XML representation of the model to a JSON object, reads the directive, processes each template file in order, substitutes all template code to JavaScript, evaluates the code, and returns the resulting SQL code. Once done, the example given above becomes the SQL below and similarly for all the other knots in your model.

While the Sisulator is in place, we still haven’t ported the old SQL generation to sisula. We expect to have this done before august, and at the same time add concurrent-temporal generation to the tool.

Ensemble Modeling

Hans Hultgren came up with a great name, Ensemble Modeling, for a family of modeling technologies that all embrace the decomposition of an entity into its constituents. Exactly how this separation is made depends on the flavor you are working with, but common to all of them is that you make assumptions on which parts are mutable and which are immutable. The least restrictive is Anchor Modeling, in which you only assume that the identity of an entity is immutable, represented by a surrogate key in a database implementation. Natural keys are mutable in Anchor Modeling. The techniques also differ in terms of grouping and how constraints are maintained. In Anchor Modeling there is no grouping, and constraints are put in place that ensure temporal entity integrity and temporal referential integrity in the database, even for concurrent-temporal models.

At the recent conference in the Netherlands we discussed such differences with Hans in one of the sessions, and the two debaters Hans and Lars were kind enough to put together a short paper on what was said and things they did not have time to say. It is available through Hans’ blog, by clicking here. Hans also made a recent blog post in which he further discusses Ensemble Modeling styles, available by clicking here. Both are useful reading for anyone interested in the commonalities and differences between these techniques.

As more techniques will fall into this category of modeling, we believe that it is good to have a name for the family, similar to how star and snowflake fit under dimensional modeling.

Next Generation DWH Modeling

Next Generation DWH Modeling

We’re back from the event yesterday in the Netherlands, where we were happy to see close to 300 participants and present concurrent-temporal Anchor Modeling for the first time. We were also happy to see Hans Hultgren gather various techniques that separate mutable from immutable content under the same name, Ensemble Modeling. We also heard Ivo Clemens from m–wise challenge our claims of how Anchor Modeling is beneficial compared to other techniques, and coming to the conclusion that almost all of them have been fulfilled in their business, from the experience of building ten data warehouses using it. One of the more memorable quotes was how a client of theirs had their data warehouse up and running in a couple of weeks!

Martijn Evers saw us through a Clash of the Supermodels and Ronald Kunenborg through the history of modeling, as well as presenting differences and similarities between Data Vault and Anchor Modeling in two sessions. We finished off the day battling it out with Hans Hultgren concerning differences between Data Vault and Anchor Modeling. All in all it was a great conference, with lots of fun, and in the spirit of cooperation rather than competition. We’d like to thank Simone Molenaar, BI-Podium, and everyone at Visser & Van Baars for their monumental effort pulling this together.

IMG_1618_2048 IMG_1619_2048 IMG_1621_2048 IMG_1622_2048 IMG_1623_2048 IMG_1617_2048 IMG_1632_2048 IMG_1633_2048 IMG_1627_2048 IMG_1628_2048 IMG_1630_2048 Next Generation DWH Modeling

Photos courtesy of Rob Vonk, who was thoughtful enough to bring a camera.