This tutorial starts with the introduction to Data Warehousing, Definition of OLAP, difference between Data warehouse and the OLTP Database, Objectives of data warehousing and data flow.
Computerization of business processes; technological advances in transmission and storage of data; and powerful database management tools have opened up new possibilities of data manipulation and analysis. Business managers are eager to explore the repositories of current and historical data to identify trends and patterns in the wrap and hoof of business. They hope to mine data and use them for taking intelligent business decisions. In this context, industries are increasingly focusing on data warehousing, Online Analytical Processing (OLAP), and other related technologies.
What’s the Difference
‘Data warehouse’ and ‘OLAP’ are terms which are often used interchangeably. Actually they refer to two different components of a decision support system. While data in a data warehouse is composed of the historical data of the organization stored for end user analysis, OLAP is a technology that enables a data warehouse to be used effectively for online analysis using complex analytical queries. The differences between OLAP and data warehouse is tabulated below for ease of understanding:
Data Warehouse
Data from different data sources is stored in a relational database for end use analysis
Data from different data sources is stored in a relational database for end use analysis Data is organized in summarized, aggregated, subject oriented, non volatile patterns.
Data in a data warehouse is consolidated, flexible collection of data,Supports analysis of data but does not support online analysis of data.
Online Analytical Processing
A tool to evaluate and analyze the data in the data warehouse using analytical queries.
A tool which helps organize data in the data warehouse using multidimensional models of data aggregation and summarization.
Supports the data analyst in real time and enables online analysis of data with speed and flexibility.
What is Data Warehousing?
‘Data warehousing’ is a collection of decision support technologies that enable the knowledge worker, the statistician, the business manager and the executive in processing the information contained in a data warehouse meaningfully and make well informed decisions based on outputs.
The Data warehousing system includes backend tools for extracting, cleaning and loading data from Online Transaction Processing (OLTP) Databases and historical repositories of data. It also consists of the Data storage area–composed of the Data warehouse, the data marts and the Data store. It also provides for tools like OLAP for organizing, partitioning and summarizing data in the data warehouse and data marts and finally contains front end tools for mining, querying, reporting on data.
It is important to distinguish between a “Data warehouse” and “Data warehousing”.
A ‘Data warehouse’ is a component of the data warehousing system. It is a facility that provides for a consolidated, flexible and accessible collection of data for end user reporting and analysis.
A data warehouse has been defined by Inmom (considered one of the founders of the Data warehouse concept) as a “subject-oriented, integrated, time-varying, non-volatile collection of data that is primarily used in organizational decision making.”
- The data in a data warehouse is categorized on the basis of the subject area and hence it is “subject oriented”
- Universal naming conventions, measurements, classifications and so on used in the data warehouse, provide an enterprise consolidated view of data and therefore it is designated as integrated.
- The data once loaded can only be read. Users cannot make changes to the data and this makes it non-volatile.
- Finally data is stored for long periods of time quantified in years and bears a time and date stamp and therefore it is described as “time variant”.
Ralph Kimball the co-founder of the data warehousing concept has defined the data warehouse as a “"a copy of transaction data specifically structured for query and analysis”.
Both definitions highlight specific features of the data warehouse. The former definition focuses on the structure and organization of the data and the latter focuses upon the usage of the data. However, a listing of the features of a data warehouse would necessarily include the aspects highlighted in both these definitions.