Support

Anchor Modeling can be used with almost any relational database, even temporal databases, but there are four features that significantly improve query performance. These are table elimination, clustered indexes, foreign keys and statistics. Note that some vendors refer to the first two as ‘join removal’ and ‘index organized tables’. In order to get the best performance in Bitemporal Anchor Modeling a fifth feature is necessary: partitioning. It is not required, but when used partitions recording time over that which is currently recorded and that which has been erased.

The Anchor Modeler, our online modeling tool, currently only supports the automatic generation of code targeted for Microsoft SQL Server. Support for other vendors is in the works.

Clustered indexes are less important when using solid state disks, but if not present the total size of the database including indexes will be larger. Table elimination is less important if every query run against the database is crafted by hand and not automatically generated, for example from an analytical or reporting tool. However, if you want to use the views and table valued functions providing temporal perspectives of the data, table elimination is crucial.

Supported Browsers

Please use a browsers that supports HTML5 canvas and 2D drawing context if you want to use the online modeling tool. Here is a list of tested browsers.

Browser	Version	Performance	Functionality
Chrome	6 – 15	Fast	Full
Chrome	5	Fast	Almost full
Canary	17	Fast	Full, but expect some crashes
Firefox	8	Fast	Full
Firefox	4	Slow	Full
Firefox	3.6	Slow	Full, linefeed bug in generated SQL
Safari	5+	Good	Almost full, cannot load local files
Opera	11.5	Slow	Almost full
Opera	10.6	Good	Partial
Internet Explorer	9	Fast	Almost no functionality

Table Elimination Support

Anchor Modeling relies on a query optimization feature called table elimination to achieve high performance. Many modern database engines support this technique, which can be seen from the chart below. Anchor Modeling can be used with all relational databases but the performance will be better in those which support table elimination.

Database Engine	Support	Example
Microsoft SQL Server 2005	full	SQL Server Example
Microsoft SQL Server 2008	full	SQL Server Example
Oracle 10gR2 Express Edition*	partial	Oracle Example
Oracle 11gR1 Enterprise/Express Edition	full	Oracle Example
IBM DB2 v9.5	full	IBM DB2 Example
PostgreSQL v8.4 beta	full	PostgreSQL Example
Teradata v12**	partial	Teradata Example
MySQL v5.0.70	none	MySQL Example
MySQL v6.0.10 alpha	none	MySQL Example
MariaDB v5.1	full	MariaDB Example
Sybase	not tested

Partial support includes support for non-historized attributes, but not for historized. These results are based on the example scripts, for which it still may be possible to rewrite the SQL to get full support for table elimination. Should anyone be able to do such a rewrite, please let us know!

* As it turns out, our example for Oracle 10g does not work as intended. While you do get table elimination, statistics cannot be used to get an optimal plan. It seems that support will be limited in 10g and only full in 11g.

** We have gotten reports that Teradata v13 has full table elimination support when using subqueries in the select part of a statement.

Table Elimination Explained

Table elimination is a feature found in many modern query optimizers. Basically, what table elimination does, is to remove tables from the execution plan when it is unneccessary to include them. This can, of course, only happen if the right circumstances arise. Let us for example look at the following query:

select
 A.colA
from
 tableA  A
left outer join
 tableB  B
on
 B.id = A.id;

When using A as the left table we ensure that the query will return at least as many rows as there are in that table. For rows where the join condition, B.id = A.id, is not met the selected column, A.colA, will contain a NULL value.

However, the result set could actually contain more rows than what is found in tableA if there are duplicates of the column B.id in tableB. If A contains a row [1, “val1”] and B the rows [1, “other1a”],[1, “other1b”] then two rows will match in the join condition. The only way to know what the result will look like is to actually touch both tables during execution.

Instead, let’s say that tableB contains rows that make it possible to place a unique constraint on the column B.id, for example and often the case a primary key. In this situation we know that we will get exactly as many rows as there are in tableA, since joining with tableB cannot introduce any duplicates. If further, as in the example query, we do not select any columns from tableB, touching that table during execution is unneccessary. We can remove the whole join operation from the execution plan.

Let us look at a more advanced query, where some query optimizers fail, and hence only provide partial support for table elimination.

select
 A.colA
from
 tableA  A
left outer join
 tableC  C
on
 C.id = A.id
and
 C.fromDate = (
   select
     max(sub.fromDate)
   from
     tableC sub
   where
     sub.id = A.id
 );

In this example we have added another join condition, which ensures that we only pick the matching row from tableC having the latest fromDate. In this case tableC will contain duplicates of the column C.id, so in order to ensure uniqueness the primary key has to contain the fromDate column as well. In other words the primary key of tableC is (C.id, C.fromDate).

Furthermore, since the subselect ensures that we only pick the latest C.fromDate for a given C.id we know that at most one row will match the join condition. We will again have the situation where joining with tableC cannot affect the number of rows in the result set. Since we do not select any columns from tableC, the whole join operation can be eliminated from the execution plan.

Queries like these arise in two situations. Either when you have a denormalized model consisting of a fact table with several related dimension tables, or when you have a highly normalized model where each attribute is stored in its own table. The example with the subselect is common whenever you store historized/versioned data.

Clustered Indexes

Anchor Modeling use clustered indexes, also called index organized tables, to further gain performance. Many database engines support this type of indexing, which physically order the data on the media according to the indexed columns. In other words, such indexes take up no extra space, contrary to regular indexes. In fact, regular indexes are very rarely needed and they are not a crucial feature for Anchor Modeling, even if present in most database engines.

Over time, or when large amounts of data is loaded at once, clustered indexes may become fragmented. In other words, the organization of the data has become less than optimal and a defragmentation or recreation of the index is needed in order to solve the problem. Always check the degree of fragmentation after loading large amounts of data and on a regular basis.

Note that indexes slow down inserts, so in databases that support it, switching them off before loading large amounts of data and then switching them back on may speed up the loading process significantly, provided that you trust the loading process not to introduce rows that break the index constraint.

Foreign Keys

Having foreign keys declared are paramount for maintaining referential integrity in the database, but they also improve select query performance. There is a second type of table elimination that can be achieved only when foreign keys are declared. In Anchor Modeling, this type will eliminate anchors from queries in which there are at least one where condition on any of its attributes. They also help the optimizer in many other ways.

As with indexes, performance when loading large amounts of data can be improved by temporarily deferring the foreign key checks until after the loading is done.

Statistics

In order to achieve near optimal execution plans, the query optimizer needs to use statistics of your stored data. This can be illustrated using an example. Below is the latest view for the Actor anchor and attributes in Transact-SQL code:

CREATE VIEW lAC_Actor AS 
SELECT
  AC.AC_ID,
  AC_NAM.AC_NAM_Actor_Name,
  AC_NAM.AC_NAM_ChangedAt,
  GEN.GEN_ID, 
  GEN.GEN_Gender,
  PLV.PLV_ID,
  PLV.PLV_ProfessionalLevel,
  AC_PLV.AC_PLV_ChangedAt
FROM
  AC_Actor AC
LEFT JOIN
  AC_NAM_Actor_Name AC_NAM
ON
  AC_NAM.AC_ID = AC.AC_ID
AND
  AC_NAM.AC_NAM_ChangedAt = (
    SELECT
      max(sub.AC_NAM_ChangedAt)
    FROM
      AC_NAM_Actor_Name sub
    WHERE
      sub.AC_ID = AC.AC_ID
  )
LEFT JOIN
  AC_GEN_Actor_Gender AC_GEN
ON
  AC_GEN.AC_ID = AC.AC_ID
LEFT JOIN
  GEN_Gender GEN
ON
  GEN.GEN_ID = AC_GEN.GEN_ID
LEFT JOIN
  AC_PLV_Actor_ProfessionalLevel AC_PLV
ON
  AC_PLV.AC_ID = AC.AC_ID
AND
  AC_PLV.AC_PLV_ChangedAt = (
    SELECT
      max(sub.AC_PLV_ChangedAt)
    FROM
      AC_PLV_Actor_ProfessionalLevel sub
    WHERE
      sub.AC_ID = AC.AC_ID
  )
LEFT JOIN
  dbo.PLV_ProfessionalLevel PLV
ON
  PLV.PLV_ID = AC_PLV.PLV_ID;

Now let’s say that we want to answer how many women have reached the highest professional acting level. This can be done using the following query on the latest view:

SELECT
  count(*)
FROM
  lAC_Actor
WHERE
  GEN_Gender = 'Female'
AND
  PLV_ProfessionalLevel = 'Oscar';

There is no need for any information from the Name attribute, so it will not be touched during execution, thanks to table elimination. Since we are only interested in two particular non-null values: ‘Female’ and ‘Oscar’ from the other attributes, the anchor can also be eliminated, as foreign keys have been declared. Furthermore, when conditions are set on the attributes like this, the outer joins become inner joins. After all these optimizations, there are two possibilities for what the remaining query may look like, or rather, execute like:

SELECT
  count(*)
FROM
  AC_GEN_Actor_Gender AC_GEN
JOIN
  AC_PLV_Actor_ProfessionalLevel AC_PLV
ON
  AC_GEN.AC_ID = AC_PLV.AC_ID
WHERE
  AC_GEN.GEN_ID = 1
AND
  AC_PLV.PLV_ID = 10;

In the query above the gender attribute will first be scanned and all women picked out, then this intermediate result set will be joined with their professional levels, and finally all those that do not have the highest level removed. The other choice is to start with the professional levels:

SELECT
  count(*)
FROM
  AC_PLV_Actor_ProfessionalLevel AC_PLV
JOIN
  AC_GEN_Actor_Gender AC_GEN
ON
  AC_PLV.AC_ID = AC_GEN.AC_ID
WHERE
  AC_PLV.PLV_ID = 10
AND
  AC_GEN.GEN_ID = 1;

In order to determine the optimal join order, which of the two queries above to pick, the optimizer needs statistics. If men and women are evenly distributed, then starting with gender will remove about half the rows for the first intermediate result set. If there are ten professional levels with most actors at medium levels, then starting with the professional level will remove most rows for the first intermediate result set. This is much more efficient, and the optimizer will strive to find the join order that produces the smallest intermediate result set first continuing progressively through the joins with as few rows as possible.

When legacy data is migrated, large numbers of new rows will be created, skewing any already gathered statistics. Therefore it is a good idea to update all statistics after loading a large amount of data and on a regular basis, if not done automatically by the database. Normally, experienced users will always know more about the data than the database, so it may also be a good idea to check the executions plans of queries from time to time to ensure that they are optimal.

Spread the love

20 thoughts on “Support”

Pingback: When we need to use 1-to-1 relationship in database design? - Database Solutions - Developers Q & A
Federico Razzoli says:

June 6, 2015 at 15:49

Also, beleive it or not, italians write in SQL. Translating SQL into Italian is a funny idea, but somewhat unpractical.

Reply
1. Lars Rönnbäck says:
  
  June 9, 2015 at 10:43
  
  Thanks for pointing this out. The translations are automatically generated by Google Translate, but apparently it doesn’t do so well, so I have removed the functionality for now.
  
  Reply
FrankF says:

September 23, 2015 at 21:35

I have noticed a huge number of logical reads and scans when deploying a simple schema (1 anchor, 3 static attributes, 1 historized attribute) in CRT mode. 76 entities and only 1 attribute posit/annex per entity and attribute. I will get about 3000 scans/6000 reads per annex table. And a lot of CPU is burned on top-1 sort. That does not look like high performance to me.

Reply
1. Lars Rönnbäck says:
  
  September 24, 2015 at 10:34
  
  Would it be possible to provide a sample model and data so I can look into why you are getting that many scans and reads?
  
  Reply
Mark Versteegh says:

September 23, 2015 at 23:11

Possible error in insert-trigger:

In the instead-of-insert-trigger for an anchor (in our case [itCA_Caller]) a case-expression similar to the following occurs for every historized attribute:
——
CASE
WHEN [dbo].[rfCA_BRA_Caller_CallerBranch](
v.CA_BRA_CA_ID,
v.CA_BRA_Caller_CallerBranch,
v.CA_BRA_ChangedAt
) = 1
THEN ‘R’ — restatement
WHEN [BRA].CA_BRA_CA_ID is not null
THEN ‘D’ — duplicate
ELSE ‘N’ — new statement
END
——
The function [rfCA_BRA_Caller_CallerBranch] returns 1 if the last value before CA_BRA_ChangedAt and/or the first value after CA_BRA_ChangedAt is equal to the given CA_BRA_Caller_CallerBranch (for given CA_ID).

The problem is that in some cases duplicates are marked as restatements (which results in a key-violation error when trying to insert).

For example: if in our case the combination
[CA_BRA_CA_ID,CA_BRA_Caller_CallerBranch,v.CA_BRA_ChangedAt] already exists (so it is in fact a duplicate) and there is also an adjacent record, with the same [CA_BRA_CA_ID,CA_BRA_Caller_CallerBranch] but different _Changed_At, then the function [rfCA_BRA_Caller_CallerBranch] returns 1 and because the SQL-Server case-expression returns the result of the FIRST when_expression that evaluates to true, the inserted record is marked as a restatement.

Is this indeed the case, or am I missing something?

Kind regards,
Mark Versteegh

—
I think an easy fix would be to move the duplicate-check before the restatement-check.

Reply
1. Lars Rönnbäck says:
  
  September 24, 2015 at 10:41
  
  This bug has already been fixed in the test version (http://www.anchormodeling.com/modeler/test) of the tool, and precisely the way you suggested, by rearranging the order of the when-statements. The code generation in the test version is stable, but we have a couple of minor GUI-bugs that needs to be sorted out before release.
  
  Reply
  1. Mark Versteegh says:
    
    September 24, 2015 at 16:21
    
    Thanks for the quick response. I tried the test version of the tool, but now I get a duplicate key-error when trying to insert a changed value for a static attribute.
    
    In the release-version these attributes where left unchanged (if an attribute value already exists for the same anchor_ID then no insert takes place), but in the test-version there is a left join on the anchor_ID and the attribute_value, which works fine for duplicates, but handles changes values as ‘new’.
    
    I would expect the value to be updated in this case, but I’m not 100% sure if this is the intended behavior.
    
    Reply
    1. Lars Rönnbäck says:
      
      October 2, 2015 at 12:46
      
      This should now be fixed in the test version. The intended behaviour is now to do nothing when using the triggers on the views, but you can still update a value by directly using the attribute table. A warning message will be shown, though, when you try to update static attributes.
      
      Reply
Truong Phan says:

October 19, 2015 at 11:19

Hello,

Please help me to know where can I get an example DWH using anchor model.

Thanks!

Reply
1. Lars Rönnbäck says:
  
  November 2, 2015 at 10:45
  
  Here is a “skeleton” model used to build a real data warehouse (which had additional attributes added):
  http://www.anchormodeling.com/modeler/latest/?id=ahNzfmFuY2hvcm1vZGVsZXItaHJkcg0LEgVNb2RlbBjKshEM
  
  Reply
Sjacco Mol says:

August 8, 2018 at 22:43

Hi,

At this moment we need to clean some customer data from our anchor model setup. If we just to a update on the last status of a customer, the history is still there. Is there a quick way to remove all data from a key is.

Thanks. Sjacco from M-Wise

Reply
1. Lars Rönnbäck says:
  
  August 9, 2018 at 09:20
  
  If you want to physically remove the data I would suggest doing deletes on the latest views. For example:
  declare @r int; set @r = 1; while (@r > 0) begin delete lAC_Actor where AC_ID in ( select AC_ID from #AC_ID_toBeDeleted ); set @r = @@ROWCOUNT; end
  This will delete both all attribute data and the anchor data. The loop is necessary in order to delete all history in historized attributes. If you also have related data that needs to be deleted you will have to roll your own “cascading delete”.
  create table #PE_ID_toBeDeleted ( PE_ID int not null ); delete AC_wasCast_PE_in output deleted.PE_ID_in into #PE_ID_toBeDeleted where AC_ID_wasCast in ( select AC_ID from #AC_ID_toBeDeleted );
  After which you can use #PE_ID_toBeDeleted to do a similar looped delete on the lPE_Performance latest view. Of course, you may want to keep performances that involved other actors than the ones deleted. In other words, it is hard to fully automate deletes, since there may be underlying business logic governing them.
  
  Reply
Søren Stammers says:

September 20, 2018 at 16:45

I’m attempting using AM to create a 6NF + 3NF views in AWS Redshift. I’m using the PostgreSQL generation scheme, but after getting a little bit complicated, it is failing to generate SQL. Maybe I’m doing something incorrect or maybe it is a limitation of the modeler? I’d like to have someone take a look at my model to see which of these things are the problem.

Reply
Waldimiro Rossi says:

April 3, 2020 at 09:23

Hi,

Your work is awesome! Thanks a lot for sharing it! I’m eager to try “Privacy actions” and “Encryption groups” (can help ABAC implementation?)!

I am analyzing an architecture on top of the Anchor Modeling for a big Swiss telecommunication company.

Very useful the mnemonic concept;) Is it possible to write the descriptions that we insert in the GUI in the views “_Anchor, _Attribute, _Knot, _Tie” views? I only see them in the XML file:

My address description

Thanks a lot

Waldimiro Rossi, Swiss Impact Ltd.

Reply
1. Lars Rönnbäck says:
  
  April 3, 2020 at 17:04
  
  I have added the descriptions in test version 0.99.4 (now live).
  
  Reply
  1. Waldimiro Rossi says:
    
    April 3, 2020 at 19:01
    
    Thank you very much! when is planned the version 1.0?
    
    Reply
    1. Lars Rönnbäck says:
      
      April 3, 2020 at 19:46
      
      You can consider the SQL Server support to already be production quality. I was hoping to get support for a couple more databases up to better quality before I announce 1.0.
      
      Reply
      1. Waldimiro Rossi says:
        
        October 16, 2020 at 09:48
        
        Hi Lars, can you contact me please via email.
Pingback: When we need to use 1-to-1 relationship in database design? – w3toppers.com