XML – Elements, Attributes, Entities
In this tutorial you will learn about Elements, Anatomy of tags, Tag naming rules, Invalid tags, Valid tags, Root and child elements, Attributes, When Do I use Attributes? Entities, Character data sections, Comments and Processing instructions.
{mos_ri}
Elements
Elements are the basic building blocks of XML
It may contain
– Other elements
– Character data
– Character references
– Entity references
– Comments
– These are collectively known as element content
Ex: < student > Mason Hill < /student >
An Element consists of three parts
1. Opening Tag < student >
2. Description Mason Hill
3. Closing Tag < /student >
Anatomy of tags
All elements must have a beginning and ending tag. The opening tag of an element is written between (< ) less than and ( >)greater than sign example, < student >. The ending tag is written between (< ) less than followed by a (/) forward slash and the ( >)greater than sign example, < /student >.
Data between the opening and closing tags of an element are its contents.
For example,
< student >Nick Price< /student >
Here Nick Price is the content of the element. Most of the browsers ignore whitespaces between the tags
< student > Nick Price < /student >
Is same as
< student >
Nick Price
< /student >
Note: Unlike HTML single tags(like < /br > in HTML ) in XML are not possible.
Tag naming rules
- XML Names must begin with
– A letter,
– underscore(_),
– colon (:) and
– valid name characters including the preceding plus digits, hyphens (-) or full stops . - The colon character should not be used, except as a namespace delimiter
- XML naming conventions is not limited to ASCII characters and ideographic characters could be used.
- It may not begin with the string “xml”,”XML”, or any match of these characters
Based on above rules examples of
Invalid tags
< .stock >< /.stock >
< product1 >< / product1 >
< product^stock >< /product^stock >
Valid tags
< _stock >< /_stock >
< product1 >< /product1 >
< product-stock >< /product-stock >
Root and child elements
The root Element is the first element in a document and it contains all other elements. In the following example student is the root element and all other elements are contained within it (name, roll-number) are child elements.
< student >
< name >
Bill Gates
< /name >
< roll-number >
55
< /roll-number >
< /student >
In XML one cannot overlap tags. The opening and ending tags of child elements must be inside the parent element. Overlap of tags with siblings is not allowed as shown in the following example.
< student >
< name >
< roll-number >
Jason
< /name >
< /roll-number >
< /student >
The proper format is as follows
< student >
< name >
Jason
< /name >
< roll-number >
< /roll-number >
< /student >
The root element is also called the Document element. There is only one root element . All other elements lie within the root.
NOTE: A Tag could be empty i.e. contain no data like the roll-number tag in above example. Such tags are called EMPTY ELEMENTS.
Attributes
Attributes give the information about the elements. They can be specified only in the element start tag and their values are enclosed strictly in double quotation-mark. This is unlike HTML where attributes could be in single, double or without quotations.
Syntax: < tag attribute = “value” >description < /tag >
Example:
< problem size=“huge” cause=“unknown” solution=“run away” >
If elements are the “nouns” of XML, then attributes are its “adjectives”.An Element can have zero, one or more attributes. Also an attribute name can only appear once within an element
Bad: < Test name=“John” name=“Doe” / >
Good: < Test first=“John” last=“Doe” / >
The million dollar question
When Do I use Attributes?
Unfortunately there is no definite answer to this question. There are many contrarian views on the use of attributes. It is widely accepted belief that attributes are metadata i.e. data about data . In such scenarios use of attributes is recommended. For example , the lang attribute describing the language of the content of the element.
Entities
Entity references are placeholders for other values that are otherwise reserved in the language or that maybe misinterpreted. For example the less than (< ) and the greater than ( > ) symbols are reserved for demarking the tags. If the entity description itself contains one of these symbols the data would be misinterpreted. To avoid such a scenario Entities are used. The ampersand (&) symbol is reserved to indicate start of an entity.
The various predefined entities are as follows
< |
LESS THAN |
> |
GREATER THAN |
& |
AMPERSAND |
" |
QUOTATIONS |
&apos |
APOSTROPHEE |
.
Character data sections
Character data sections contain raw data that are not parsed by XML parsers.
Syntax : < ![CDATA[ raw data ]] >
Example:
< book ISIN = “INB101235647” >
< author >
Kacey Price
< ![CDATA[ kacey has also authored “Complete Reference ” series]] >
< /author >
< /book >
Comments
Comments are enclosed in < !—Comments — >
Example :
< !—This is start of second child element — >
Processing instructions
Processing instructions are used to pass information to applications which use this information to execute special task.
Syntax : < ? ? >
Example
< ?xml version=”1.0” encoding= “ISO-8859-1”? >
NOTE: Here the version attribute specifies the version of XML being used while encoding gives the encoding format for parsers.
A xml document displayed in IE 5.0 or above.