Online Analytical Processing (OLAP) is essentially data presented as Cubes, dimensions, hierarchies and measures. Users can navigate a complex set of data intuitively using these objects. In this context, consistent response times for each view or slice of data become important. Therefore modes of storing and retrieving data became the key tenet of storage design.
In the early days OLAP technology focused upon specialized, non-relational storage models as the only possible mode for OLAP. They called this technology Multidimensional OLAP(MOLAP). Later vendors discovered that the use of database structures(Star and snowflake schemas) helped in indexing, and storage of aggregates, and that relational database management systems could be used for OLAP. These vendors called their technology of storage Relational OLAP(ROLAP).
MOLAP implementations usually outperform ROLAP technology, but have problems with scalability. ROLAP is scalable and can leverage information from the relational database technology.
Hybrid OLAP is an effort to harness the best features of both ROLAP and MOLAP and provide the user with superior performance and scalability.
Microsoft SQL Server 2000 Analysis Services leads the market in giving the user flexible options to choose between the various storage modes. The OLAP Administrator can make his choice between MOLAP, ROLAP and HOLAP and the underlying data model will be entirely invisible to the client application and the end user will only perceive cubes. The integration of OLAP services with relational databases is superior in that it maintains strong links with the source data, the OLAP multidimensional metadata and the aggregations themselves by linking the graphical user interface design tools and wizards directly to OLE DB. While defining ROLAP data models, all relational database structures are defined, populated and maintained. The developer is not burdened with the need to define relational database structures or worry about managing complex queries across multiple tables and servers.
The goals of Analysis services storage engine is to improve ease of use so that the applications using database technology can be deployed widely and the database becomes completely transparent to the database administrator. The ease of use is fostered by the following features:
- Standard operations can be performed by end users themselves and the database administrator is free to perform his other jobs. Branch offices, Mobile units and desktop users can now access the Analysis services in a variety of ways for analysis of data.
- Transparent server configuration, database consistency checker(DBCC), index statistics and database backups make for ease of use.
- The streamlined and simplified configuration options, automatically adapt to the specific needs of the environment.
Organizations that are expanding their business too, find Microsoft SQL Server 2000 Analysis services useful as it delivers a single database engine that scales from a laptop computer to terabyte size symmetric multiprocessing(SMP) clusters while maintaining the security and reliability demanded by mission critical business systems. The features that make it scalable are:
1. New disk format and storage subsystem to provide storage that is scalable for small to large databases.
2. Redesigned utilities that support terabyte size databases efficiently
3. Large memory support to reduce the need for frequent disk access.
4. Dynamic row level locking to allow increased concurrency, especially for online transaction processing applications.
5. Unicode support to allow for multinational applications.
Reliability is ensured by replacing complex data structures and algorithms with simple structures that scale better and do not have concurrency issues. The Analysis services dispenses with the need to run the DBCC check prior to every backup and this results in significantly faster DBCC.
One factor that impacts on cube storage is sparsity. Sparsity is defined as an instance of the longest common subsequence problem in which the number of matches is small compared to the product of the lengths of the input strings. The performance of the cube depends on the nonzero structure of the matrix as well as the characteristics of a given memory system. It tends to perform poorly on modern processors, because of its high ratio of memory operations to arithmetic operations and the irregular memory access patterns. Missing or invalid data values create sparsity in the OLAP data model. In the worst case, an OLAP product would nonetheless save an empty value. For example, a company may not sell all products in all regions, so no values would appear at the intersection where products are not sold in a particular region. Analysis Services has got round this problem in innovatively by not allocating storage space to empty cells. Both MOLAP and ROLAP implementations manage storage requirements extremely well, as a result, and create smaller
OLAP models of data as compared to the source. Data compression is employed and a sophisticated algorithm designs efficient summary aggregations to minimize storage without sacrificing speed.
While data sparsity is frequently highlighted by some OLAP vendors as a deciding factor in OLAP architectures, the differences between vendor implementations of sparsity management is minor compared to the more significant data explosion caused by pre-calculating too many aggregates. Analysis services gets round this problem by determining which aggregations provide the greatest performance improvements and also allows the database administrator trade off between system speed and disk space required to manage aggregations through the storage design wizard.
To ensure that the OLAP models confirm to actual usage patterns, Analysis services optionally logs queries sent to the server and then fine tunes the set of aggregations based on these logs. The Usage based optimization Wizard allows the database Administrator to direct the Analysis services to create a new set of aggregations for all queries that take longer than ‘n’ seconds to answer.
It is critical to optimally design and store a number and variety of cubes for data analysis is large scale data warehouses. While designing these cubes Analysis Services prompts the users to make important decisions about the storage mode and the level of aggregation of cubes. Since cubes contain a large number of aggregations and data in multidimensional structures, they require substantial storage. Storage and aggregation options can be made using the Storage Design Wizard or the Usage Based Optimization Wizard or an explicit filter can be defined to restrict the source data that is read into the cube while using any of the storage modes available in Analysis services.
Relational OLAP stores fact data and aggregations in the relational database server. Multidimensional OLAP stores fact data and aggregations in the Analysis server in a multidimensional format. Hybrid OLAP stores fact data in the relational database server, and aggregations are stored on the Analysis server in an optimized multidimensional format.
MOLAP storage takes up more space than HOLAP or ROLAP as the MOLAP storage does not contain a copy of original fact data and dimensions. Aggregation calculations also increase the disk space required. Linked cubes use the aggregations of its source cube (which is stored on another Analysis Server) and has no data storage requirements of its own. Real-time cubes use the relational database for real-time OLAP. It requires no additional storage space as all aggregations are performed in real time. Virtual cubes also requires no storage space as they draw their fact data and aggregations from different source tables already existing in the Analysis server to present a view of the data to the user.