Metadata is a description of data. It provides a consistent way to describe data structures. It is used to describe data as it is being transformed and gives a clear explanation of the meaning of different fields, measures, levels and dimensions. All releases of Microsoft development and data management tools support and allow access to the Repository. The storage of data about data in a central location has some advantages.
Reusability: With the storage at a central location the data can be used and reused.
Dependency tracking: the repository stores information about different objects and the relationships between stored objects. Hence tracking of information regarding relationships is possible.
Tool interoperability : The tools can be used in different systems
Data resource management: Global meta data is stored in the repository for the enterprise data warehouse. It contains a resource library of available services and components, which makes the data easier to manage and more efficient.
Team development: The repository stores meta data about a project’s source control and enables the management of concurrent activities on different versions and configurations of project design and development.
There are two type of metadata in the Analysis services system—Technical metadata and Business metadata.
Technical metadata describes data in a clear and unambiguous way. It contains names of fields, tables, databases, levels, hierarchies and so on. In short it is information about the technical aspects of data.
Business metadata describes data to non technical users so that they can understand the information they are viewing. This includes descriptions of fields, tables, databases and so on.
The Repository technology is the core of SQL Server 2000 data management services. The Repository consists of Information models, the Repository engine, XML interchange, and extensibility.
The Microsoft Repository is a storage place for data about data. It is organized around a set of objects called information models. An information model is a template for a particular kind of data. Information models define meta data in terms of object types and their relationships. The information models have a hierarchical relationship with more specific models inheriting characteristics from their parent models. The repository information helps users assess the impact of changes, aids them in tracking down problems and understand the data they use to make mission critical decisions. The repository is therefore an important part of the data warehousing strategy.
The root model from which all other information models are derived is the Unified Modeling Language(UML). Though users can define their own information models for describing data Microsoft created the Open Information Model(OIM) as a standard specification for storing information about systems and warehouses.
The information model for COM is different from information models of other data types. The relationships between models are specified by the generic model. Microsoft uses the Extensible Markup language (XML) to integrate support for exchanging information in the Repository.
The Database information model is the basic model that stores database information. It stores metadata about data sources and data destinations. This model is derived form the Unified Modeling Language Information model.
- The SQL Server Information Model, the Oracle Information model and the DB2 Information Model are used to store information specific to the data base system. These are derived from the Database Information model.
- The Database Transformation Information Model is used to store information about data transformations. The model is derived from the Database Information model.
- The Data Transformation Services Information model stores data transformation information specific to Microsoft’s Data Transformation Model. This is also derived from the Database Information model.
- The OLAP Information Model is the basic model for multidimensional data structures. This model is also derived from the Database Transformation Information Model.
- The Microsoft OLAP Information Model stores multidimensional data information that is specific to Microsoft OLAP services. This model is derived from the OLAP Information Model.
The repository is physically located in a set of tables in the database. The default location of the repository is the msdb database.
The Repository Engine handles meta data storage and retrieval. It uses the information model and stores meta data instances described in the Information model in the SQL tables and columns. Caching is used to optimize access to meta data. Versioning is an important feature introduced by Microsoft. This implies that changes are captured on the run and the repository maintains a history of the updates. This process allows the user query the changes over time.
XML Interchange help users easily import information into the repository from custom sources or other tools. The Extensibility feature of the repository lets users add information to the models to cover topics specific to their organization or tool. The Microsoft Repository SDK enables the extension of information models that the repository uses. Users can even store information about locations that are not part of the OIM, and the model can be extended to add a new object or relationships to track the information. New properties can be added to track information for existing objects.
Integration with the Microsoft Data-Warehousing Framework
The repository is a component of the Microsoft Data warehousing framework and server strategy. It adds value to the warehousing package and can be accessed by clicking the Meta data node under the Data Transformation services. Thereafter, it is possible to explore the relational schema information and package and lineage information. Relationships between columns and packages can be explored and updated.
Data Transformation Services is the Extraction, transformation and loading tool of SQL Server 2000. The flexibility of this tool helps users populate the warehouse and save packages directly into the Repository. The lineage feature of DTS requires the Repository as it tracks how data in the warehouse was calculated and when it entered the warehouse. Data can be saved into the repository by saving the DTS information into the SQL Server Repository. The Advanced tab on the package properties can be used to set the scanning options and thereby call the OLE DB scanner to load all source and target catalogs into the Repository. If the scanning options are not set, DTS creates DTS Local catalogs as reference for all source and target catalogs and this will make the location of the databases inaccessible.
Saving DTS transformations into the repository has its own problems. Choosing a query as a transformation source, results in the source becoming an object that is not part of the OLE DB imported data. Connection to the original source also becomes difficult. Users need to use a script to perform a simple transformation and choose the source columns explicitly. In this case all transformation data is captured. However, the problem of choosing a query as a transformation source can be solved by writing a program to resolve the references in a repository or by using a custom model along with DTS model to store the source and target mappings.
Versioning is used for the package object in DTS. However, it replicates all subordinate objects with each save. This enables the user to go back to any version of a package and see exactly how the data was transformed. This feature is very important from the point of view of tracking down problems related with data months or years after it has entered the warehouse. The versioning works well if appropriate scanning options to import the relational schemas are used.
OLAP Services provides multidimensional analysis for the data warehouse. By using this facility in the Repository SDK, all definitions for the OLAP data can be imported into the repository. SQL Server 2000, OLAP services stores meta data directly in the repository. Connections to the measures and dimensions will have to be made manually or programmatically based on the data available in the repository model.
The English Query lets the user define a semantic model for a database and then translated the English phrases into SQL. The Repository SDK is provided with an utility to import data from the English Query into the repository. English Query also has an inbuilt utility to import and export models from the repository.
OLE DB (relational schemas) are imported into the repository to provide a base set of information to begin the documentation of the warehouse. Database schemas can be imported from any OLE DB or ODBC compliant data source. The import can be run from the Enterprise manager by right clicking the Metadata tab under data transformations services and selecting ‘Import Metadata’. Full versioning is used to load the data and the changes can be tracked over time. The versioning feature enables the preservation of descriptive information, comments and rescanning the catalog does not impact on the information.