MSAS – Introducing Dimension and Cube Processing – IT Training and Consulting

The Analysis server is like a black box. Everything that happens in the server is recorded, but the end user sees a multidimensional cube created out of the data in the data warehouse and has very little taste of what really goes on behind the scenes. In this section we will be examining this black box to understand how the Analysis server processes Dimensions and cubes. We will also learn how to optimize cube processing and how to troubleshoot cube processing.

Processing Dimensions: Introduction

Dimensions are building blocks on which the cubes are constructed. When a dimension is processed the Analysis server creates an SQL statement to extract the necessary information from the data warehouse dimension table. At least one or two columns are extracted for each level in the hierarchy. If the member key column and the Member name column properties are the same, the Analysis service extracts only one of them, else it extracts both. One row is extracted for each distinct level key and the rows are sorted using the Member key Column property for the levels. For instance the user wants to process a region dimension containing the following rows and columns.

Region_id	Sales_region	Sales_city	Sales_country
5	Canada West	Vancouver	Canada
6	Canada West	Victoria	Canada
2	Mexico Central	Mexico City	Mexico
4	Mexico West	Guadalajara	Mexico
1	Central West	San Francisco	USA
3	South West	Los Angeles	USA
7	South West	San Diego	USA
8	South West	Coronado	USA
9	South West	National City	USA
10	South West	Lincoln Acres	USA

The data in the warehouse is organized from a relational database perspective. The user however wants the data from a hierarchical perspective. Analysis Services extracts data from the warehouse by creating a unique path that contains a component number for each level of the hierarchy. The Region dimension contains three levels. The region_id and the Sales_region belong to the same level and the Sales_city and Sales_country have one level each.

When the data is retrieved from the data warehouse, the Analysis services begins by creating the All Level member of the dimension.:-0-> 0-> 0. The server then creates the path for the next row which contains the members for Sales_country, Sales_city and Sales_region. In other words the server creates four members for the first row from the relational dimension.:- An ALL Level member and the members Canada West, Vancouver and Canada. Only three members are created for the rows that follow. At the end of the task the server would have created a unique path for each member of the dimension. The path contains the genealogy of each member. The Region dimension, then contains, the member names and paths shown below.

Complete member name	Member path	Member ID
[North America]	0->0->0	1
[North America].[Canada]	1->0->0	2
[North America].[Canada].[Canada West]	1->1->1	3
[North America].[Canada].[Canada West].[Vancouver]	1->1->1	4
[North America].[Mexico]	2->0->0	5
[North America].[Mexico].[Mexico Central]	2->1->0	6

Note that the path is created after sorting out the children of the member by using the value of the Member Key column property, regardless of the value of the Order By property. Also note that a separate ID for each member is created and the Order By property is taken into consideration while creating the ID. However the sorting is not on alphabetical order. The sequence number of the ID matches the sequence of the path numbers by and large and sorting is done on names by default. But the Analysis server follows the logic of the hierarchy order rather than the alphabetical order. If the region id of a member(whose name is lower on the alphabetical order) is higher than the region id of another member (whose name is higher on the alphabetical order) the former is placed above the latter. For instance if the region id of San Francisco(1) is higher than Coronado(8), then San Francisco will be placed above Coronado in the hierarchical order. The Multidimensional Expressions (MDX) query retrieves data in the hierarchical order.

The Analysis server then combines the paths from all the members and creates a map for the dimension. The dimension map allows the Analysis server to slice and dice hierarchies very quickly.

Processing Cubes: Introduction

The Analysis service is extremely efficient in creating relatively small cubes from massive data tables in the data warehouse. A conceptual understanding of how the process works would be useful in designing the cube and in retrieving data for reporting.

As stated earlier, cubes contain dimensions combined with one or more measures. These dimensions form a structure or organization for the data values in the cube. The dimensions used in the cube must be processed before the cube can be processed.

SQL statement is executed by Analysis services when a cube is processed. This statement retrieves values from the fact table. The columns retrieved completely identify each member and all the measures used. A compound path is then, created for each row. Let us look at the rows of a sample cube:

Year	Quarter	Month	Sales_country	Sales_region	Sales_City	Unit_sales
1998	1	1	Canada	Canada West	Vancouver	320
1998	1	1	Canada	Canada West	Victoria	300
1998	2	2	Mexico	Mexico Central	Mexico City	234
1998	1	3	Mexico	Mexico West	Guadalajara	567
1998	4	2	USA	Central West	San Francisco	876
1998	3	4	USA	South West	Los Angeles	234
1998	2	1	USA	South West	San Diego	213
1998	1	5	USA	South West	Coronado	345
1998	4	1	USA	South West	National City	987
1998	5	2	USA	South West	Lincoln Acres	231

The dimensions in the row are Region and Time. The server finds the leaf level member path for each dimension and combines the paths to create a row. Schematically the path would read something like this 1->1->1-> .1->1->1. This internal path of the cube is a number generated consisting of one sub number for each level of the dimension used in the cube. The dimensions in the above cube contain three levels each. Therefore it has a path containing 6 numbers one for each of the three levels of each dimension. The more dimensions a cube has the more numbers that are generated for the levels in the dimensions. The more the information stored in the cube.

Analysis Service creates a data file to store the cells of a cube. When a single row is processed, a single row is extracted from the fact table and the path for the leaf level cell is calculated. The server then checks to see whether a leaf level cell with that path already exists. If it exists, the server adds a new measure to the one already in the cell. If the cell does not exist, then, the server creates a new leaf level cell storing both the path and the value of the measure. This process is not done for ROLAP and HOLAP storage.

Next the Analysis server creates cells for the aggregations designed by the user. If the cell exists the measures are added to it, if not a new cell is created and the measures are stored in it. This completes the processing for a single row of the cube. This process is repeated until all rows are done.

During this process Analysis Server also creates a number of index files to facilitate rapid retrieval of the values. Once all the values have been accumulated the Process Log window announces the successful completion of the processing.

[catlist id=181].

Processing Dimensions: Introduction

Processing Cubes: Introduction

Related posts

MSAS – Browsing the Dependency Network

MSAS – Browsing the Dependency Network

MSAS – Building a Relational Decision Tree Model

MSAS – Building a Relational Decision Tree Model

MSAS – Introduction to Data Mining

MSAS – Introduction to Data Mining