XML Data Types in SQL Server 2005
In this tutorial you will learn about XML Data Types, Query and Inexes in SQL Server 2005 – XML Data type, XML Typing, Advantages of storing XML in databases, Untyped XML data, Using Typed XML, Managing XML indexes, XQuery and XQuery syntax.
XML Data type
As stated above the XML data type is a new introduction in SQL Server 2005. It enhances the developer’s ability to work with XML within the framework of T-SQL. This data type can be used just like any other data type, be it a typing of variables or columns. A number of methods have been defined for this data type which enables optimal use of the XQuery to check data existence or individual values. XML Data can be queried or modified.
XML Typing
When schemas are registered in SQL Server 2005 database, it has the added advantage of constraining and typing XML. This reduces storage size and provides a mechanism for validation.
The process of creating typed XML involves the creation of schema collections within the database. This is accomplished using CREATE XML SCHEMA COLLECTION statement. Each schema within the collection describes a namespace.
One of the most important attributes of the xml schema is the Location element and the LocationID attribute of the element. This attribute causes the Schema to validate any XML and it must be present in the schema and must have an integer value if the schema is to be executed error free.
XML data type has five methods. Four of this use XQuery syntax for performance of repetitive tasks and the last method use an extension of XQuery to perform simple updates.
The Exist Method checks for the existence of data within XML. This method uses the Instructions column defined within the XML to select records that have instructions at the Location with a LocationID. The root element forms the main element node and Location is its child, while LocationID is an attribute of Location. The path is represented as parent/child[attribute]. The definition of the namespace therefore, results in a path which is a part of XPath and XQuery specifications.
The Value Method can be used to extract a single int or varchar value from the XML. It has two arguments—the XQuery expression and the T-SQL data type(except xml, user defined data type, image, text, ntext and timestamp).
The Query method allows the execution of the XQuery against the XML data. It does not return a single value but a set of nodes of the XML data.
The Modify method allows the modification of XML data with the XML data type without rewriting the structure.
The last of the methods is the Nodes method which allows the shredding of XML data into relational data.
Advantages of storing XML in databases
XML support in the database brings with it the advantage of being able to translate relational data into XML and vice versa. It follows the distributed applications can be built with ease. Structured and semi-structured data can be stored in a single location; variable content can be defined within a relational model and the most suitable model can be selected for the applications requirements while taking advantage of the optimized data storage and querying environment of SQL Server 2005.
The InfoSet of an XML document is stored in an efficient format internally. The data stored is similar to the original xml data except that the white space, order of attributes, namespace prefixes and xml declaration are not retained. A number of functionalities are inbuilt. Columns defined as xml can now be indexed using XML indexes and full text indexes. This improves querying. The XQuery-based data retrieval methods can be used such as value and exist methods. Data modification based on XQuery can be used to perform updates on the XML data.
Untyped XML data
Untyped xml data can be defined in an XML column or variable. This can be done by assigning a string(varchar, nvarchar, text or ntext) value that contains a well formed XML document or fragment. The string value can be cast implicitly using the Transact SQL CAST function or can be cast explicitly using the CONVERT function.
Using Typed XML
Typed XML can be used if a XML column or variable is declared with the associated XML schema collection that contains the schema to be used to validate the XML.
The column or variable can be assigned in the same way as it is assigned for an untyped XML, but it must conform to the schema associated with the column or variable and must be declared within the target namespace.
The typing can limit the values to a single document or fragments of multiple documents. This can be done by specifying the CONTENT and DOCUMENT in the definition. While CONTENT allows fragments, DOCUMENT allows only single document values. The default value is CONTENT.
Using XML Schemas
XML schemas are to be registered in the database if typed XML has to be used. The schema specifies the namespace for the XML document and defines the elements and attributes to be included in them. The CREATE XML SCHEMA COLLECTION statement is used to create the schema. The syntax would read as under
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
CREATE XML SCHEMA COLLECTION sql_identifier AS Expression ;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
The sql_identifier is a valid Transact SQL identifier for the XML schema collection and Expression is an XML value containing one or more XML schema documents.
A query in the sys.xml_schema_collections displays the information about the schema collections. Individual XML namespaces can also be retrieved by querying sys.xml_schema_namespaces in the catalog view. The components defined in the view can be queried in the sys.xml_components catalog view.
An XML schema collection can be modified by using the ALTER XML SCHEMA COLLECTION statement. This statement can be used to add or remove schemas from the schema collection. A schema can be deleted from the schema collection by using DROP XML SCHEMA COLLECTION statement.
Managing XML indexes
Query performance can be improved on XML columns by creating an xml index subject to a number of conditions:
1. A clustered primary key must exist in the index. XML indexes prevent the clustered primary key of the table from modification if they exist in the table.
.
2. Only one primary XML index can be created per xml column.
.
3. Secondary XML indexes for PATH, PROPERTY, and VALUE queries can be created.
.
4. An XML and non XML index cannot exist on the same table with the same name.
.
5. IGNORE_DUP_KEY and ONLINE statements of CREATE INDEX and CREATE XML statements are not available for xml indexes.
.
6. An XML index on an xml type column cannot be created in a view or table-valued variable with XML type columns or on xml type variables.
.
7. An XML index must be dropped on an XML column type before the ALTER TABLE statement is used to change the index from typed to untyped or vice versa.
.
8. XML index requires that ARITHABORT is turned on before the index is created or during data modification operations (INSERT, UPDATE and DELETE).
The CREATE PRIMARY XML INDEX is used to create primary XML index on an xml column. The PATH index is used if queries have to be executed to retrieve data from the xml column by specifying the path and value. A secondary PROPERTY index is used to retrieve node values from the xml column by specifying a path. The Value index is to be used if queries have to be executed to retrieve data in the xml column by specifying an imprecise path.
The ALTER INDEX statement is used to alter the table index and rebuild it. The DROP INDEX can be used to drop an index.
XQuery
The language used for querying XML data is known as XQuery. The syntax includes and extends XPath 2.0 expressions. Complex queries can be performed against an xml data source. The methods provided by the data type retrieve data from the xml and such data can be updated by specifying an XQuery expression. The XQuery is modeled on the W3C XQuery 1.0 language specification.
XQuery syntax
As stated earlier the XQuery contains two main sections-the prolog section for declaration of namespaces and importing of schemas
declare namespace awi =
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
"http://schemas.exforsys.com/Invoices"; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
/awi:InvoiceList/awi:Invoice[@InvoiceNo=1000] ;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
and a body for specifying data to be retrieved.
The expressions used can be simple path expressions (/InvoiceList/Invoice) or complex expressions (/InvoiceList/Invoice[@InvoiceNo=1000]) to generate an xml result. The XPath describes the location of the node in an XML document and are absolute or relative.
The language specification includes FOR, LET, ORDER BY, WHERE and RETURN statements. These are known as FLOWR statements. A sample syntax for the FOR statement would be as under.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
for $i in /InvoiceList/Invoice/Items/ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Item[../../@InvoiceNo=1000] ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
return $i;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
The various methods used in the xml query have already been detailed and are merely mentioned here. XQuery has six methods—Value, exist, modify, insert, delete and Nodes. Sample syntax for some of these queries would be as under:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Value ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
SELECT xmlCol.value(‘declare default namespace = ;;;;;;;;;;;;;;;;;;;;;;;
"http://schemas.exforsys.com/InvoiceList"; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
/InvoiceList/Invoice/@InvoiceNo)[1]’, ‘int’) ;;;;;;;;;;;;;;;;;;;;;;;;;;;
Query ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
SELECT xmlCol.query(‘declare default namespace = ;;;;;;;;;;;;;;;;;;;;;;;
"http://schemas.exforsys.com/InvoiceList"; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
< InvoiceNumbers > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
{ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
for $i in /InvoiceList/Invoice ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
return < InvoiceNo > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
{number( $ i / @ InvoiceNo)} ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
< /InvoiceNo > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
} ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
< /InvoiceNumbers >’) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Exist ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
SELECT xmlCol.exist(‘declare default namespace = ;;;;;;;;;;;;;;;;;;;;;;;
"http://schemas.exforsys.com/InvoiceList"; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
/InvoiceList/Invoice[@InvoiceNo=1000]’);;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
In the next tutorial we shall be looking at the Service Broker and how it has proved to be a great asset to the SQL Server 2005 developer!