About as
Services
Promotion
Portofolio
Products
Solutions
Prices
Cr.Decisions
Contact

Copyright 2000-2001
Prodavise-mail us

A LOOK AT XML (Extended Markup Language)

HTML is the HyperText Markup Language standardized by W3C (World Wide Web Consortium) for storing and exchanging documents on the World Wide Web. HTML was designed to be simple enough to support ease of authoring Web pages, rich enough to support multimedia embedding in documents and flexible enough to support hypertext linking.

HTML is based on SGML, the Standard Generalized Markup Language standardized by ISO for defining and using portable document formats. SGML was designed to be formal enough to allow proofs of document validity, structured enough to handle complex documents and extensible enough to support management of large information repositories. W3C's SGML working group with present efforts given in their activity page, is attempting to standardize the delivery (in Web documents) of self-describing data structures with arbitrary depth and complexity. To that end, they are simplifying SGML for use with the Web (and Web technologies such as Java).

XML (Extensible Markup Language) is a simplified (but strict) subset of SGML that maintains the SGML features of validation, structure and extensibility. XML is a standardized text format designed specifically for transmitting structured data to Web applications. In addition, XML's goals of being easier to learn, use and implement than full SGML will have clear benefits for World Wide Web users, making it easier to define and validate document types, to author and manage SGML defined documents and to transmit and share them across the Web. The Extensible Markup Language specification describes XML documents, a class of data objects stored on computers and partially describes the behavior of XML processor programs used to read XML documents and provide access to their content and structure. XML allows generic SGML to be srved, received and processed on the Web in a manner similar to what is done with HTML today. XML has been designed for ease of implementation and for interoperability with both SGML and HTML. XML documents are composed of entities, which are storage units containing text and/or binary data. Text is composed of character streams that form both the document character data and the document markup. Markup describes the document's storage layout and logical structure. XML also provides a markup mechanism to impose constraints on the storage layout and logical structure of documents.

Back to Top

XML and SGML

XML, like SGML, is a meta-language for describing the markup of different types of documents. However, its specification is 26 pages (versus 500 for SGML!). The W3C hopes that offering a simplified version of SGML will make implementing SGML much more palatable to vendors of Web authoring and browsing tools. XML is not a replacement for SGML. Many features of SGML were left out to keep XML simple. Current SGML users may choose XML for network delivery and since XML is a valid subset of SGML, the translation from SGML to XML is straightforward. XML was developed as an easy on ramp to SGML for people who are not yet using it.

To simplify SGML, the W3C working group dropped support for certain features that put a heavy processing burden on SGML client software. For example, a well-formed XML document is unambiguous, so a browser or editor can read the tags and create a tree of the hierarchical structure without having to read its document type definition. XML also does not allow markup minimization, requires that empty elements be self-identifying and does not support several other complex SGML standard features.

Back to Top

XML and HTML

XML is not a replacement for HTML, either HTML is a useful tool for storing and exchanging small hypermedia documents across the Internet. Furthermore, it is easy to generate HTML documents on the fly from XML (or SGML) documents. XML is designed to complement HTML by enabling different kinds of data to be exchanged over the Web. For example, current limitations in World Wide Web technologies do not allow the extensibility, structure and data checking necessary for large-scale commercial Web publishing. Jon Bosak's excellent paper "XML, Java and the Future of the Web" explains how XML can enable advanced Web applications, allowing Java applets to embed powerful, automatable data manipulation facilities directly into Web clients.

Unlike HTML, which has a fixed (though ever-changing) set of tags, XML lets you define your own tags and attributes. Support for XML by the Internet community would open up vast new possibilities for Internet publishing instead of shoehorning all documents into HTML or having to invent a browser to handle non-HTML documents, XML would enable a wide array of documents with user-defined tagsets to be handled by generic Web application software. As Tim Bray pointed out, "XML allows us to finally get off the HTML treadmill."

Back to Top

XML and Java

Presently, an author can create rich documents with an application and then use a Java applet viewer to attach those documents to Web pages. As long as the browsers continue to provide only crude formatting, such measures are unfortunate but inevitable, much in the same way people use desktop publishing applications to get better typography that can be done with off-the-shelf word processors.

But there is no reason why the concept of a "basic Web page" needs to be limited to a single tag set! The appeal of the Web is its simple hypertext scheme, which provides a simple, unambiguous method of pointing to files with unique names. Although it is handy that HTML is also simple, the success of word processors has demonstrated that consumers can cope with multiple document types. When XML becomes more widespread, Web authoring tools will become much more flexible in handling basic document constucts. WordPerfect and Word will export directly into XML, using the style names as tags instead of filtering everything into 90 (or however many currently exist) predefined tags.

In such a brave new World Wide Web, Java's role will be to do interesting things with the content, such as mediation between formats, computation and event handling, automation of tasks and dynamic content, presentation of different views to different viewers and even intelligent filtering of content. XML specification co-editor Tim Bray succinctly put it, "XML gives Java something to chew on."

Back to Top

ABOUT XSLT (Extensible Stylesheet Language Transformation)

XSLT is a language for transforming XML documents into other documents. XSLT is designed for use as part of XSL, which is a stylesheet language for XML. In addition to XSLT, XSL includes an XML vocabulary for specifying formatting. XSL specifies the styling of an XML document by using XSLT to describe how the document is transformed into another XML document that uses the formatting vocabulary. XSLT is also designed to be used independently of XSL. However, XSLT is not intended as a completely general-purpose XML transformation language. Rather it is designed primatily for the kinds of transformation that are needed when XSLT is used as part of XSL.

Back to Top

WHAT IS XSLT?

So you've decided to use XML to capture your information in a robust, hierarchical fashion. You are creating your structured data, maybe even using CSS (Cascading Stylesheets) to render your documents as they exist today and you are building a collection of information following a consistent model. Congratulations! Now what are you going to do? The information you've described is probably in a hierarchy that is organized in a way important to you because you are the one creating it. This hierarchy you've chosen may not be the hierarchy that makes sense for those processes or other people that need to work with your information. Or, perhaps you have your own reasons to reorganize your information in different ways, with different orientations to fulfill your own different requirements, to emphasize your information differently for your colleagues, to hide certain private information from your customers etc.

Or, perhaps unfortunately you haven't been able to create your information in the best structure for all you needs. To accommodate how your information is presented you may have been restricting how your information has been and is being created, making it difficult to relate or pull together while being forced to think of a single end result at all times. In short, the way you have chosen to capture or describe your information isn't always the way you want to present or use your information.

Back to Top

WELCOME TO XSLT!

XSLT (Extensible Stylesheet Language Transformation) W3C recommendation describes a transformation vocabulary used to specify how to create new structured information from existing XML documents. Unlike with a programming language, you don't need to be a programmer to successfully describe how to transform your information. XSLT implements transformations "by example", not just "by program logic" and builds in support for the kinds of transformation typically needed to present information. Your objective, as an XSLT stylesheet writer, is to give an XSLT engine examples of how each of the constructs in your information is supposed to be structured once it has been transformed. You create these examples as "templates" of the result and you tell the engine when these templates get added to the resulting tree your transformation is creating. Your stylesheet templates can include your instructions to the XSLT engine to hunt down information anywhere in your input XML file or many XML input files, to fill in holes in your template where your own information belongs.

When using XSL (Extensible Stylesheet Language) formatting objects to present your information, your stylesheet objective is to transform your information into a hierarchy exclusively comprised of the XSL formatting object vocabulary. A rendering engine then takes this result hierarchy and interprets the semantics of the XSL vocabulary to produce your desired rendition, all without using a single construct of your own vocabulary, because you've transformed your own information into rendering information. With XSLT you are not restricted to present your information in the same order you created it and you are not required to present all of your information. Also, you can traverse your source XML multiple times if you need to reuse your information more than once in a single result, perhaps to simultaneously create tables of content and internal cross references described abstractly in your raw structured content.

The XSLT recommendation describes how XSLT engines can choose to support different ways to serialize the templates you have added, in combination with the information you glean from your source tree, to the result tree. Your result can be realized using XML syntax conventions, HTML syntax conventions (with or without CSS) or just simple text. Think of the possibilities: with XSLT you can take your structured XML information and synthesize new instances for your colleagues and customers to use or build HTML/CSS web pages from data stores or feed other systems with flat text representations of your data or create operating system scripts etc. This gives you the freedom to organize the information the way you want from the beginning, to best meet your own business requirements and still fulfill your obligations and desires to utilize your information in a myriad of ways downstream. Discover how XSLT can be used today with your information!

Back to Top