This tutorial covers OLAP solutions used by Data warehouses and understanding Data Warehouse design. The enterprise needs to ask itself certain fundamental questions before actually launching on the process of designing the data warehouse. It must begin with a conviction that a data warehouse would really help its business and the return on investment will make it worth it.
Defining OLAP Solutions
The data warehouse offloads data from a multitude of sources. The cleaned, validated and loaded data is voluminous and daunting. This data needs to be organized, categorized and arranged in meaningful order for analytical purposes. OLAP solutions are specifically designed to cater to this need.
OLAP solutions used by Data warehouses are:
Multidimensional views of data. Data in the data warehouse is organized into subject oriented categories and tables. Fact tables are constructed and linked to various dimensional tables in star or snowflake schemas or combinations of them to form multidimensional views of data. Cubes are built using these multidimensional schemas. Rapid browsing and querying then becomes possible. These views are independent of the way in which data is stored in the data warehouse.
Interactive query and analysis of data is another OLAP solution that enables users drill down, drill up and slice data by using multiple passes. Users can drill down to successive lower levels of detail or roll up to higher levels of summarization and aggregation.
Analytical modeling is an OLAP tool that is a calculation engine for deriving ratios, variances etc., involving measurements and numerical data across many dimensions.
Functional models are made available by using OLAP for forecasting, trend analysis etc. They support users in data analysis.
Graphical OLAP tools are used to display data in 2D or 3D cross tabs and charts and graphs with easy pivoting of axis. This is important for users who need to analyze data from different perspectives and the analysis of one perspective leads to business questions that need to be examined from other perspectives.
Rapid response to queries is a must in any analysis of data and the measure of success for the OLAP tool. Nigel Pendse and Richard Creeth, authors of the OLAP Report developed the FASMI (Fast Analysis of Shared Multidimensional Information) test to judge whether or not an application qualifies to an OLAP tool. Their contention was that an OLAP tool should provide fast browsing capabilities (< five seconds), should contain analytical tools both for the developer and the end user; the cubes must be able to handle the security requirements of sharing confidential information and it should present data multi-dimensionally.
Multi dimensional data storage engine stores data in arrays. These arrays are logical representations of the business dimensions.
Understanding Data Warehouse design
At a very global level, construction of the data warehouse is a business project by itself. The enterprise needs to ask itself certain fundamental questions before actually launching on the process of designing the data warehouse. It must begin with a conviction that a data warehouse would really help its business and the return on investment will make it worth it.
The general questions that are asked may be as below….
- Do we need a data warehouse?
- How will it help the business?
- What will it mean in terms of cost?
- What are the current data analysis methodologies being adopted?
- In what way are they deficient?
- Will setting up the data warehouse help in reducing these deficiencies?
- What kind of reporting and analysis do we really want?
- What is that we are getting now?
- Will such data analysis make the business more efficient?
- Will it help the business improve its services and customer relations?
Once the replies to the above questions have been asked, the organization needs to examine other very crucial issues that will determine the wrap and hoof of the data warehouse that is being set up.
- What are the kinds of data that are being generated by the enterprise?
- What kinds of data storage technologies are currently being used to backup and store historical data?
- What other external sources of information do we need to tap to make the data in the data warehouse meaningful for analysis?
- What kind of hardware and software will be required to set up this data warehouse?
- Who will be the personnel to handle the process of creating the data warehouse?
- Which departments will benefit from the data being created?
- Will the data warehouse be scaleable?
- How will it connect to the different data sources for data?
- How will we ensure that quality data is generated?
- What kinds of tools will be deployed to support end user needs for reports and analytics?
The answers that emerge from these questions will be a set of business requirements. These requirements will determine the kind of data warehouse that will be ultimately set up in the enterprise. The first steps would be to define the global parameters that will shape the design of the data warehouse. The design can be a top down approach as recommended by Bill Inmon or a bottom up approach recommended by Ralph Kimball. It can be a combination of the two called the Hybrid approach or it can be a federated approach. Let us have a brief look at what these different approaches mean.