Processing Dimensions: Introduction
Dimensions are building blocks on which the cubes are constructed. When a dimension is processed the Analysis server creates an SQL statement to extract the necessary information from the data warehouse dimension table. At least one or two columns are extracted for each level in the hierarchy. If the member key column and the Member name column properties are the same, the Analysis service extracts only one of them, else it extracts both. One row is extracted for each distinct level key and the rows are sorted using the Member key Column property for the levels. For instance the user wants to process a region dimension containing the following rows and columns.
Region_id |
Sales_region |
Sales_city |
Sales_country |
5 |
Canada West |
Vancouver |
Canada |
6 |
Canada West |
Victoria |
Canada |
2 |
Mexico Central |
Mexico City |
Mexico |
4 |
Mexico West |
Guadalajara |
Mexico |
1 |
Central West |
San Francisco |
USA |
3 |
South West |
Los Angeles |
USA |
7 |
South West |
San Diego |
USA |
8 |
South West |
Coronado |
USA |
9 |
South West |
National City |
USA |
10 |
South West |
Lincoln Acres |
USA |
The data in the warehouse is organized from a relational database perspective. The user however wants the data from a hierarchical perspective. Analysis Services extracts data from the warehouse by creating a unique path that contains a component number for each level of the hierarchy. The Region dimension contains three levels. The region_id and the Sales_region belong to the same level and the Sales_city and Sales_country have one level each.
When the data is retrieved from the data warehouse, the Analysis services begins by creating the All Level member of the dimension.:-0-> 0-> 0. The server then creates the path for the next row which contains the members for Sales_country, Sales_city and Sales_region. In other words the server creates four members for the first row from the relational dimension.:- An ALL Level member and the members Canada West, Vancouver and Canada. Only three members are created for the rows that follow. At the end of the task the server would have created a unique path for each member of the dimension. The path contains the genealogy of each member. The Region dimension, then contains, the member names and paths shown below.
Complete member name |
Member path |
Member ID |
[North America] |
0->0->0 |
1 |
[North America].[Canada] |
1->0->0 |
2 |
[North America].[Canada].[Canada West] |
1->1->1 |
3 |
[North America].[Canada].[Canada West].[Vancouver] |
1->1->1 |
4 |
[North America].[Mexico] |
2->0->0 |
5 |
[North America].[Mexico].[Mexico Central] |
2->1->0 |
6 |
Note that the path is created after sorting out the children of the member by using the value of the Member Key column property, regardless of the value of the Order By property. Also note that a separate ID for each member is created and the Order By property is taken into consideration while creating the ID. However the sorting is not on alphabetical order. The sequence number of the ID matches the sequence of the path numbers by and large and sorting is done on names by default. But the Analysis server follows the logic of the hierarchy order rather than the alphabetical order. If the region id of a member(whose name is lower on the alphabetical order) is higher than the region id of another member (whose name is higher on the alphabetical order) the former is placed above the latter. For instance if the region id of San Francisco(1) is higher than Coronado(8), then San Francisco will be placed above Coronado in the hierarchical order. The Multidimensional Expressions (MDX) query retrieves data in the hierarchical order.
The Analysis server then combines the paths from all the members and creates a map for the dimension. The dimension map allows the Analysis server to slice and dice hierarchies very quickly.
Processing Cubes: Introduction
The Analysis service is extremely efficient in creating relatively small cubes from massive data tables in the data warehouse. A conceptual understanding of how the process works would be useful in designing the cube and in retrieving data for reporting.
As stated earlier, cubes contain dimensions combined with one or more measures. These dimensions form a structure or organization for the data values in the cube. The dimensions used in the cube must be processed before the cube can be processed.
SQL statement is executed by Analysis services when a cube is processed. This statement retrieves values from the fact table. The columns retrieved completely identify each member and all the measures used. A compound path is then, created for each row. Let us look at the rows of a sample cube:
Year |
Quarter |
Month |
Sales_country |
Sales_region |
Sales_City |
Unit_sales |
1998 |
1 |
1 |
Canada |
Canada West |
Vancouver |
320 |
1998 |
1 |
1 |
Canada |
Canada West |
Victoria |
300 |
1998 |
2 |
2 |
Mexico |
Mexico Central |
Mexico City |
234 |
1998 |
1 |
3 |
Mexico |
Mexico West |
Guadalajara |
567 |
1998 |
4 |
2 |
USA |
Central West |
San Francisco |
876 |
1998 |
3 |
4 |
USA |
South West |
Los Angeles |
234 |
1998 |
2 |
1 |
USA |
South West |
San Diego |
213 |
1998 |
1 |
5 |
USA |
South West |
Coronado |
345 |
1998 |
4 |
1 |
USA |
South West |
National City |
987 |
1998 |
5 |
2 |
USA |
South West |
Lincoln Acres |
231 |
The dimensions in the row are Region and Time. The server finds the leaf level member path for each dimension and combines the paths to create a row. Schematically the path would read something like this 1->1->1-> .1->1->1. This internal path of the cube is a number generated consisting of one sub number for each level of the dimension used in the cube. The dimensions in the above cube contain three levels each. Therefore it has a path containing 6 numbers one for each of the three levels of each dimension. The more dimensions a cube has the more numbers that are generated for the levels in the dimensions. The more the information stored in the cube.
Analysis Service creates a data file to store the cells of a cube. When a single row is processed, a single row is extracted from the fact table and the path for the leaf level cell is calculated. The server then checks to see whether a leaf level cell with that path already exists. If it exists, the server adds a new measure to the one already in the cell. If the cell does not exist, then, the server creates a new leaf level cell storing both the path and the value of the measure. This process is not done for ROLAP and HOLAP storage.
Next the Analysis server creates cells for the aggregations designed by the user. If the cell exists the measures are added to it, if not a new cell is created and the measures are stored in it. This completes the processing for a single row of the cube. This process is repeated until all rows are done.
During this process Analysis Server also creates a number of index files to facilitate rapid retrieval of the values. Once all the values have been accumulated the Process Log window announces the successful completion of the processing.