This page presents a (not too technical) introduction to XML for the newcomer to the subject.
Contents:| Why is XML so important? | |
| What is XML? | |
| Where is it used? | |
| What benefits does it bring? | |
| Who is backing it? | |
| How do you use it? | |
| What do you need to know? | |
| How do I find out more? |
After the incredible phenomenon of the World Wide Web, it is hard to believe that there is an even bigger revolution following on its heels. In the heyday of the radio, few people stopped to think that the radio only handled sound and didn't handle vision. Today however television has a far bigger impact than radio.
So what is the missing ingredient in the case of the World Wide Web. The answer is data. Sure you can add data to web pages using the web page standard of HTML. Furthermore you can make the meaning of data clear to the person who reads the web page. But the weakness is that HTML is not a suitable language for making data meaningful to computer programs. This is a serious shortcoming because the whole business world (banking, insurance, retail, etc) is dependent on computer programs interpreting data. The standard that is going to allow business to send data over the Internet in a way that computer programs can understand is XML.
<author>John Triance</author>
to indicate that "John Triance" is an author. This is known as an XML element. You create an XML document by nesting a number of elements. E.g.
<HyperGlossary>
<author>John Triance</author>
<sections>
<section>Learn how
to</section>
<section>Terms
A-Z</section>
<section>Concepts</section>
<section>Training</section>
</sections>
<url>http://www.hyperglossary.co.uk</url>
</HyperGlossary>
XML differs from HTML in that it is extensible. So whereas the user of HTML
has a fixed set of tags (<H1>, <IMG>, etc) available, the user of
XML can invent whatever tags make sense for his or her application area.
As in HTML you can also have attributes. So for example you could write
<url alt="http://www.hyperglossary.com">http://www.hyperglossary.co.uk</url>
to indicate that the HyperGlossary has an alternative URL.
The major purpose of XML is for communicating data. For any application you need to have agreement for a standard set of tags along with the rules for their use. Once this is agreed, the data that is to be sent from one point to another is "wrapped" in the appropriate tags and sent over the Internet or any other network. XML is generally accepted as the standard mechanism for sending data between different types of computer system.
A major benefit of XML is that it is self documenting - thus an application receiving XML data can "understand" the data and therefore process easily it . Unlike HTML, the data is not locked into a particular display format. So for example, if the data is to be displayed in a browser, it is possible to allow the user to choose which data he or she sees. By this mechanism you could have a web site that can display each page at a level to suite each user (e.g. a beginners format or an advanced users format).
Anywhere that data is transmitted from one computer to another. More specifically.
All the major players. For example IBM, Microsoft, Oracle and Sun and all very active in providing support and advocating its use.
XML is a mechanism for storing data. So what do you want to do with data? Create it, validate it, transmit it, view it, process it and store it.
XML files can be created by using an editor, using an XML generator, converting another XML file using XSL and by programming.
Using an Editor: Since XML is stored in text files it can be edited by any text editor (notepad, textpad, vi, etc). If you use these then you are responsible for making sure that all the rules of XML are followed - matching opening tags with closing tags, etc. Because this is tedious and errorprone, you might prefer to use an XML aware editor that enforces the rules for you. Examples of such editors are xml-pro, xmetal and xml spy.
Using an XML Generator: The source of data for an XML file is often a database. The major database systems provide (or are in the process of providing) generators which extract data from databases and wrap it in appropriate XML tags. For example the web wizard in SQL Server has been enhanced to produce XML as well as HTML.
Converting from another XML file using XSL: the extensible stylesheet language (XSL) can be used to extract data from one XML file and insert it into another. Thus you can extract the parts you are interested in and alter the format of the XML file to whatever structure you like.
By programming: You can create XML from any programming language. However to assist you in processing the XML document the Document Object Model (DOM) has been specified. This provides a set of functions that automate many of the steps in creating XML and helps you navigate around the document that you are working on.
How do you check that an XML has been correctly constructed. To make XML easier to process a number of simple rules are rigidly enforced. The main ones are
| every opening tag has a matching closing tag - e.g. <author>...</author> | |
| XML is case sensitive so, for example, you cannot use </Author> to close the <author> element. | |
| attributes must all be enclosed in quotation marks or apostrophes (" or ') | |
| tags must be properly nested - so <a><b>....</b></a> is fine, but <a><b>....</a></b> is not |
If a document follows all these rules it is called well formed. XML documents must be well formed. This is different form HTML where you can get away with breaking all these rules. If an HTML document is well formed it is known as XHTML. Since it satisfies the XML rules it can then be processed by the various XML tools.
There is also a concept of valid XML. XML is valid if it conforms to its specification. A specification is required for each XML format that you use. In the specification you would for example (taking the case of the HyperGlossary document above) state that the HyperGlossary element may consist of an author element, a sections element and a url element. You would also specify that the sections element consists of one or more section elements. You would also indicate which attributes were permitted in each element.
The standard mechanism for creating this specification is a Document Type Definition (DTD). DTDs have certain problems associated with them, not least of which is that they do not conform to the rules of XML. You thus have to learn different rules of construction for DTDs and any software that processes them has the overhead of having to handle the different format. To tackle this and other limitations something called XML Schemas are being developed but are not yet standardised.
Since XML is such a precisely defined language a number of XML Parsers have been produced. These are programs which read an XML file, check that it is well formed. Some parsers are also capable of checking if the XML is valid. Thus the parser takes care of processing the XML and DTD and presents the information to your program in a more manageable form.
Some XML editors use parsers to ensure that all XML that is created is well formed and valid.
Since XML is stored in a text file transmission across the Internet is not a problem. It can be transmitted using the standard protocols, HTTP and FTP, used for other files such as web pages. For example you can put XML pages on a web site and if the browser requests say "http://www.hyperglossary.co.uk/xml/terms.xml" the web server would send the XML file to the browser. When it arrives, of course the web browser would need to know what to do with it. If you have an XML enabled browser (such as internet explorer V5 try clicking on the link to see what happens). It displays it as XML because they is no stylesheet to tell the browser how to display it.
Unlike HTML, XML does not contain information about how to display it. So without further information a browser can only display it as XML. Internet Explorer V5 does go a step further and lays it out in a way that makes its structure clear and furthermore allows you to click on any element to hide or reveal its sub-elements.
If you want to present the data in the XML file in the form of a web page you can convert it into HTML. This can be done using XSL. Remember earlier that we mentioned that XSL can convert XML files from one format to another, so what you do is convert the XML file into XHTML (which you will recall is HTML that is well formed) and pass it to the browser.
Other ways of presenting XML in a browser are to use the DOM (discussed below), use cascading style sheets (interface not fully developed) or use another aspect of XSL which display directly without going through HTML (not standardised yet).
In many cases you will want to read the XML into a program for processing. As mentioned earlier you could read the XML directly into the program as a text file and disentangle it yourself but there are some programming interfaces that make it easier for you. The two most popular ones are DOM and SAX.
The DOM (Document Object Model) which we mentioned in the context of creating XML can also be used to read existing XML files, navigating through them and extracting whatever data you need. Interfaces to the DOM exist in JavaScript, VBScript, Java and various other languages. If you are running the script in a browser then you could display the data you extract from the XML document in the browser window using standard browser scripting methods. If you are running the script on a server you could do whatever you like with the data: store it in a spreadsheet, write it to a database, or send it to another computer.
SAX stands for Simple API for XML. It can currently be used in various programming languages using a Parser that supports SAX.. What happens is that the parser passes through the XML file and notifies the program when certain events happen, e.g. when it encounters a start tag or an end tag. You write code that gets executed when each event occurs. This code is given access to the part of the the XML file that relates to that event.
XML is better suited to transmitting data than storing it. Databases on the other hand are designed to safeguard data over long periods, allow efficient concurrent access from multiple users while all the time guaranteeing integrity. XML was not designed for this and we should resist the temptation to push the latest hot technology beyond what it is good at.
So the usually model is that data continues to be stored in databases and XML is used as the carrier for data between databases and between databases and other software components (such as browsers). To transfer data from a database to XML the options are:
To transfer data from XML to the database you would write a program that processes the XML file (using DOM or SAX) and update the database in the normal way.
For those who do want to store the data in XML format, database systems are coming up with ways to oblige. Some even support concurrent access the XML documents by multiple users. These features are being provided with Relational databases and with Object databases.
The first thing to note is that learning XML itself is the easy bit. If all you are doing is specifying XML files then you would also need to learn DTDs or XML Schemas. If you are going to process them then you would need to learn some combination of XSL, DOM and SAX. For more information see how to learn XML.
If you want training or consultancy then contact us.
If you want to do more research then a set of links is provided.