How To Manage Current and Historical Information Within Your Data Warehouse
In order for a company to use a data warehouse successfully, it must be designed so that users are able to analyze historical and current information. There are a number of things that will result from this technique, and they will have an effect on the data model design and ETL functions.
As an example, a finance company may need to analyze their profits for the last three years. Looking at the data from the last three years will allow the user to view the transformations the company has gone through. Unfortunately, only having a current view will not allow the user to get the information they need.
One of the biggest challenges that data warehouse managers face today is the issue of how to manage dimensional tables over a given period of time. Not only must they be able to do this, but they must also be able to manage this information with current data. There are a number of basic modeling techniques that are used, and one example of this is slowly changing dimensions, which can come in type 2 or 3. The SCD 2 technique can be utilized to display a dimensional table when a change within similar columns needs to be analyzed over a given period of time. The SCD 2 method will use keys within the table that will not change.
Whenever a change occurs, a new key will be added to the table. If meta data columns were added to the structure of the table, the new keys can be tagged as current while the records which were loaded in the past can be tagged as being historical. The SCD 2 is a technique which will solve part of the problem, because it will add historical information into the dimension. The queries that are run within the table will generate the correct historical views, and can use the keys from both tables. The second common element that is used to generate historical information is called SQL, or Structured query language.
The goal of using SQL is to produce facts which are grouped together over time. The information is grouped based on iterations and the keys that the dimensions have moved through. Using SQL may not be helpful in a situation where the goal is to overlook historical changes which have been made to a table. It may also not be useful when all the facts that must be displayed must be connected to current information. The meta data can find the rows in the table and ignore certain types of data. The fact table will be connected to the historical keys. If the current dimension is constrained, and the fact table data is retrieved, the reporting results may not be correct.
Intricate SQL techniques can be utilized in order to gain the data on the current row. As the same time, you will still be able gather the fact table rows which are connected to the production key. The success of SCD 2 is dependent on abilities of the tool. The use of the meta data will also play a role in its success. Another method that is used to balance historical and current data is SCD 3. Specifically, SCD 3 can model a dimension table to grab the current and historical changes which have been made on a key. It will use two columns within the table, and while one of them will be for current data, the other will be used for historical data.
The fact tables are connected to a single row, and this solves the problem of using multiple keys which is found when using SCD 2. A database management system can be utilized to identify either the historical or current columns.
It may also be possible for other tools to use alternative techniques. Another technique that is used to manage historical and current information is SCD 2+. With this technique, you will simply manage two sets of tables, and one will deal with current facts while the other will deal with historical facts. If you want to use SCD 2+, you will need a tool that can use the dimension tables that will be used to issue reports.