The data warehouse and the OLTP data base are both relational databases. However, the objectives of both these databases are different.
The OLTP database records transactions in real time and aims to automate clerical data entry processes of a business entity. Addition, modification and deletion of data in the OLTP database is essential and the semantics of the application used in the front end impact on the organization of the data in the database.
The data warehouse on the other hand does not cater to real time operational requirements of the enterprise. It is more a storehouse of current and historical data and may also contain data extracted from external data sources.
The differences between these two relational databases, is tabulated below for information.
Data warehouse database | OLTP database |
Designed for analysis of business measures by categories and attributes | Designed for real time business operations. |
Optimized for bulk loads and large, complex, unpredictable queries that access many rows per table. | Optimized for a common set of transactions, usually adding or retrieving a single row at a time per table. |
Loaded with consistent, valid data; requires no real time validation | Optimized for validation of incoming data during transactions; uses validation data tables. |
Supports few concurrent users relative to OLTP | Supports thousands of concurrent users. |
However, the data warehouse supports OLTP system by providing a place for the latter to offload data as it accumulates and by providing services which would otherwise degrade the performance of the database.
Objectives of a Data warehouse and Data flow
The primary objective of data warehousing is to provide a consolidated, flexible meaningful data repository to the end user for reporting and analysis. All other objectives of Data warehousing are derived from this primary objective. The data flow in the warehouse also is determined by the objectives of data warehousing.
The data in a data warehouse is extracted from a variety of sources. OLTP databases, historical repositories and external data sources offload their data into the data warehouse. Achieving a constant and efficient connection to the data source is one of the objectives of data warehousing. This process is known as Data Source Interaction.
The data extracted from diverse sources will have to be checked for integrity and will have to be cleaned and then loaded into the warehouse for meaningful analysis. Therefore, harnessing efficient data cleaning and loading technologies (ETL—Extraction, Transformation and Loading) to the warehousing system will be another objective of the data warehouse. This process is known as Data Transformation service or Data preparation and staging.
The cleaned and stored data will have to be partitioned, summarized and stored for efficient query and analysis. Creating of subject oriented data marts, dimensional models of data and use of data mining technologies would follow, as the next objective of data warehousing. This process is called Data Storage.
Finally tools necessary for query, analysis and reporting on data would have to be built into the system to the process to deliver a rich end user experience. This process is known as Data Presentation.
Users need to understand what rules applied while cleaning and transforming data before storage. This information needs to be stored separately in a relational database called Metadata.
Metadata is “data about data”. Mapping rules and the maps between the data sources and the warehouse; Translation, transformation and cleaning rules; date and time stamps, system of origin, type of filtering, matching; Pre-calculated or derived fields and rules thereof are all stored in this database. In addition the metadata database contains a description of the data in the data warehouse; the navigation paths and rules for browsing the data in the data warehouse; the data directory; the list of pre-designed and built in queries available to the users.