Subscribe by Email

Your email:

Connect with us

Business Intelligence Blog

Current Articles | RSS Feed RSS Feed

Pleading for a complete data warehouse solution combining Data Vault & dimensional approaches

  
  
  

This article is the fifth in a series of articles discussing the Data Vault modeling approach for data warehousing.

We first introduced the Data Vault approach and then described in more details the three types of entities that compose it: hubs to represent business concepts, links to combine business concepts and satellites to describe and contextualise hubs and links.

How does this approach for data warehouse modeling compares with the two classics Inmon's modeling by subject and Kimball's dimensional modeling? Don't worry, I don't intend to go back to the old debate between supporters of Inmon and those of Kimball. Let’s just say that both camps can get satisfied…

By itself, the hubs-satellites-links architecture is flexible, stable and scalable. But let's get back to the fundamental purpose of a Data Vault model: getting a raw data warehouse of untransformed data which allows easy recovering, at any point in time, of the data state as found in the sources. The data is only grouped in the hubs-satellites-links structure. Three important consequences of this approach:

  • There is complete and easy traceability with the sources at any point in time.
  • Since there are no transformation and because transformation is the most complex task in an ETL, the ETL development time becomes very short and the new data arrives quickly in the warehouse.
  • The structure reveals data quality problems without trying to fix or work around them.

Dimensional modeling does not have the same objective. It is oriented toward using & querying transformed data. Emphasis is on simplicity and performance. Since the dimensional model uses transformed data, traceability to the sources is more difficult to achieve. Conversely, although it is possible to obtain a certain level of performance and despite the fact that the architecture of the Data Vault model is relatively simple, the Data Vault approach does not reach the level of performance and simplicity found in dimensional models.

How to get the best of both worlds? The answer: by keeping the two types of warehouse. But that does not mean repeating the error so often made : to build a normalized warehouse by subject and to derive dimensional models! There is a fundamental difference   : the effort to bring the data from the sources to the normalized warehouse is high and the effort from the normalized model to the dimensional model is also far from being negligible. Another problem : data's traceability is not guaranteed.

Figure 1 summarizes the combined usage of approaches. As mentioned earlier, new data arrives quickly into the Data Vault warehouse. There is no need to wait for the implementation of transformations. In addition, the ETL required to move from the Data Vault model to the dimensional model is simpler than the traditional ETL between sources and a dimensional structure.

datavault5graphicEN

Figure 1 Combination of the Data Vault and dimensional approaches

 

What about the normalized model? In fact, the Data Vault model is in itself an improved version of the classic standardized structure. It is simpler and more flexible while providing the same benefits.

In short, the ideal data warehouse jointly uses the Data Vault's raw data warehousing combined with the simplicity and performance of dimensional models.

Comments

Currently, there are no comments. Be the first to post one!
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics