Connection between Data Model and Data Warehouse
Data models are required in order to build data warehouses. The problem is, those who develop data warehouses need to be able to show results at a fast pace. Data models are problematic in that they take a very long time to build. Is it possible, then to speed up the process somehow?
The data that is in the data warehouse is the most atomic data that exists. The various data summaries and aggregations are external to the data warehouse, being found in such places as DSS applications, data marts, ODS, etc. These constantly changing forms of data aren’t situated in the atomic level of data that is in the data warehouse.
In fact, data models should only be concerned with basic elemental data. It does not have to concern itself with information such as regional weekly revenues, quarterly units sold, regional monthly revenues, etc. These types of data should be situated outside the data warehouse. So the data model doesn’t have to specify every single permutation of atomic data.
What’s more, data in data warehouses is incredibly stable. It hardly ever changes. It is the outside data that changes. So the data model for the data warehouse is very small, but also very stable.
The attributes of data found in the data warehouse should have information on subjects that is possible to be interpreted in a broad, general way. It has to be far reaching, representing many streams and classes of data. So if a given subject area is marked Customer and is modeled properly on the data warehouse standard, then it should include attributes of all sorts of customers – past, present, and future. Attributes should be arranged in the data model that note when a person became a customer, when a person was last a customer, and whether that person was ever a customer. All this has to be noted in the Customer subject area.
In placing all the relevant attributes that may be needed to classify a customer in the subject area, preparations have been made for future contingency for this piece of data. As a result, the DSS analyst will be able to utilize these attributes in order to have a look at past, potential, or future customers, as well as present day customers. The data model should operate as a tool for paving the way for this ultimate flexibility, in placing the right attributes in the data warehouse’s atomic data.
To use a further example of the placing of numerous attributes within atomic data, a part number could include all kind of info about a part, even if the information is not immediately needed by current requirements. A part number can include such attributes as part number, technical description, drawing number, engineering specification, raw goods, precious goods, replenishment categories, weight, length, accounting cost basis, bill of material to, bill of material from, store number, assembly identification number, packaging info, etc.
It might seem that some of these attributes appear to be extraneous for the vast majority of information that is typically processed in production control processing. However, by including all of these attributes to the data model’s part number, then the road has already been paved for future forms of processing that are unknown at present but just might arise some day.
To put it in other terms, the data warehouse’s data model should try its best to include numerous classifications of data – as many as possible. It should never exclude any forms of reasonable classification. By taking care of this at the outset, the responsible data modeler is setting the stage for the multitude of requirements that the data warehouse should serve to satisfy.
Thus, from the standpoint of data modeling, the most atomic data should be modeled by the data modeler with the largest interpretive latitude. It is not difficult to create a data model like this, and it can serve to represent a company’s simplest data.
Once all this has been defined in the data model, the data warehouse shall be prepared to take on many different requirements. Therefore, it’s not such a difficult task when it comes to modeling atomic data and inserting attributes that will allow that data to be stretched in any way.