XML Scripting and XSLT

Design principles of Judo the sport and the language

Judo Reference

JuSP Reference

Wiki/Weblog

Tutorials/Presentations

Downloads

Goodies

Feedback

Article: XML Scripting and XSLT

Judo Language

JuSP Platform

JuSPT CM Kit

Jamaica Language

This article is old and is being consolidated into the book.
Please refer to the corresponding chapter(s) therein.
If the chapters or sections are not completed yet, you can use this article.
Refer to the examples as they are tested against the latest code.

Table Of Content

SAX-Style XML Processing
» Namespace
» SAX 2.0 Events
DOM Processing
» Namespaces
» Create XML from DOM
» List and Navigate DOM
XSLT
Java XML Parsers
Summary
Code Listings

XML Scripting and XSLT

By James Jianbo Huang February 2002

printer-friendly version

Abstract XML scripting is one of the main motivations for JudoScript itself. JudoScript supports SAX and DOM programming and XSL transformation. In the do..as xml event-driven statement, SAX events, including tags, are denoted as labels, followed by handler code. JudoScript extends SAX with a text tag, which is a compound tag that includes the opening, closing tags and the text enclosed within. The opening tag's attributes are accessible, too. If a tag has mixed content, you have the option to copy or ignore all the embedded tags. For each event, the built-in variable $_ represents the tag or the event. XML data can be read into a DOM by the do..as dom expression, which returns an org.w3c.dom.Document object. You can use a function variable as a node filter when using the Java DOM traversal API. To create a DOM, use system functions createDom(). The xslt statement applies XSL transformations and also copies XML documents and outputs files or DOMs.

1. SAX-Style XML Processing

SAX is, indeed, a simple API for XML programming. Each tag, including text, becomes an event that application code may choose to respond. JudoScript goes one step further by supporting the text enclosed in between specific tags. This saves you the trouble to implement state machines just to retrieve text content of tags.

Listing 1. books.judo
1: do $$local as xml 2: { 3: <book>: print ($_.hardcover=='true')?"Hard":"Soft"; 4: println '-cover Book ------------'; 5: <date>TEXT : println ' Date: ', $_, ' (', $_.type, ')'; 6: <title>TEXT : println ' Title: ', $_; 7: <author>TEXT : println 'Author: ', $_; 8: <isbn>TEXT : println ' ISBN: ', $_; 9: } 10: 11: EndScript ------------------------------------------------------- 12: 13: <booklist> 14: <book> 15: <title> UNIX in a Nutshell </title> 16: <author> Daniel Gilly </author> 17: <publisher> O'Reilly & Associates, Inc. </publisher> 18: <isbn> 1-56592-001-5 </isbn> 19: <date type="first edition"> 1986-12 </date> 20: <date type="second edition"> 1992-06 </date> 21: </book> 22: <book hardcover="true"> 23: <title> Advanced Programming in the UNIX Environment </title> 24: <author> Richard Stevens </author> 25: <publisher> Addison-Wesley </publisher> 26: <isbn> 0-201-56317-7 </isbn> 27: <date type="copyright"> 1993 </date> 28: <date type="twelfth printing"> 1996-12 </date> 29: </book> 30: </booklist>

Briefly on logistics. On line 1, $$local represents an input text stream for the data enclosed in this script after the EndScript marker, starting on the following line. The do..as xml statement can take input streams, readers, files and URLs. It takes some options; in this example we just accept its defaults.

Let us discuss regular XML tags first. In their handlers, $_ represents the current tag. It is a read-only struct, with its members being the tag attributes; see line 3. Its toString() method, which implicitly called by print statement, reproduces the tag itself. To enumerate its attributes, though, it is different. It uses these methods: countAttrs(), getAttrName() and getAttrValue().

<some_tag>: for x from 0 to $_.countAttrs() {
              println $_.getAttrName(x), ' => ', $_.getAttrValue(x);
            }

The closing tags can be handled the same way, except they do not have attributes.

The named text tags, like lines 5 though 8 above, are represented by $_, which contains all the attributes for the opening tag. But its toString() returns the text, not the tags. The opening and closing tags can be handled separately; if the closing tag is being handled, it happens after its text handler is called. So this is the way to reproduce a text tag:

<some_tag>:     println $_;
<some_tag>TEXT: println $_;
</some_tag>:    println $_;

In XML data, text content may be enclosed within tags along with other tags; the parent tag, in this case, is referred to as having mixed content. So how to get the text of a mixed tag? The JudoScript SAX-style processing allows you to either copy the embedded tags (which is the default), or ignore them. That is it. If more detailed, accurate handling is needed, switch to DOM. To ignore the embedded tags, specify a minus sing - before :. To explicitly specify copying the embedded tags, put a plus sing + before :.

<some_tag>TEXT-: println $_;

Annonymous tags, both opening and closing ones, can be handled by <>; unnamed text pieces can be handled by just :TEXT.

<>:   println $_;
:TEXT: println $_;

SAX also reports other events as shall we see later, and JudoScript addes two more: BEFORE and AFTER events.

Namespace

The options for do..as xml are passed on to the XML parser. At this stage, these two flags are used most often: namespace and validate. They are by default false.

Listing 2. namespaces.judo
1: do $$local as xml with namespace, validate, 2: xmlns:n="judoscript/xml/namespaces" 3: { 4: <n:article>: println ' Date >> ', $_.date; 5: TEXT<n:headline>: println 'Headline >> ', $_; 6: TEXT<n:author>: println ' Author >> ', $_; 7: TEXT<n:body>: println ' Body >> ', $_; 8: 9: ERROR: println ' <ERROR>: ', $_.getMessage(); 10: } 11: 12: EndScript ------------------------------------------------------ 13: <?xml version="1.0"?> 14: <!DOCTYPE news:article [ 15: <!ELEMENT news:article (news:headline, news:author, news:body)> 16: <!ATTLIST news:article date CDATA #REQUIRED> 17: <!ELEMENT news:headline (#PCDATA)> 18: <!ELEMENT news:author (#PCDATA)> 19: <!ELEMENT news:body (#PCDATA)> 20: ]> 21: <news:article xmlns:news="judoscript/xml/namespaces" date="02-Dec-2000"> 22: <news:headline>SAX 2.0 Released</news:headline> 23: <news:author>F. Bar</news:author> 24: <news:body>SAX 2.0 has been released into the public domain</news:body> 25: </news:article>

On line 2, the xmlns:n=... associates the prefix "n" to a URI; any appearances of this prefix in the handler labels (lines 4 though 7) are using this namespace. Note that in the data, the prefix is different ("news") but they both represent the same namespace, hence the tags match.

Line 9 shows one of the events other than tags and text: ERROR. This is a recoverable error as opposed to fatal error; parsers may call this and later recover and fix the problem. Using the "crimson" parser, for example, this handler is called for data on line 21 for "news:article" that its prefix is not defined, even though in the same tag the namespace is being defined. The "xerces" parser does not have this issue.

SAX 2.0 Events

SAX 2.0 has defined some more events that applications can handle. The following program demonstrates all the SAX 2.0 and JudoScript-defined events you can handle in do..as xml.

Listing 3. sax2_events.judo
1: do 'SAX2.0.xml' as xml with namespace, validate 2: { 3: :TEXT: println ' TEXT: ', $_; 4: <>: println '<', $_.getRaw():>14, '>: name =', $_.getName(); 5: println ' uri =', $_.getUri(); 6: println ' local=', $_.getLocal(); 7: for x from 0 to $_.countAttrs()-1 { 8: println ' Attribute: ', 9: $_.getAttrName(x), '=', $_.getAttrValue(x); 10: } 11: :BEFORE: println ' <BEFORE>.'; 12: :AFTER: println ' <AFTER>.'; 13: :START_DOC: println ' <START_DOC>.'; 14: :END_DOC: println ' <END_DOC>.'; 15: :START_NS_MAP: println ' <START_NS_MAP>: ', $_.prefix, ' => ', $_.uri; 16: :END_NS_MAP: println ' <END_NS_MAP>: ', $_; 17: :PI: println ' <PI>: ', $_.instruction; 18: println ' ', $_.data; 19: :COMMENT: println ' <COMMENT>: ', $_; 20: :START_CDATA: println ' <START_CDATA>.'; 21: :END_CDATA: println ' <END_CDATA>.'; 22: :START_DTD: println ' <START_DTD>: name=', $_.name, 23: ' publicID=', $_.publicID, ' systemID=', $_.systemID; 24: :END_DTD: println ' <END_DTD>.'; 25: :END_ENTITY: println ' <END_ENTITY>: ', $_; 26: :ELEMENT_DECL: println ' <ELEMENT_DECL>: name=',$_.name,' model=',$_.model; 27: :ATTR_DECL: println ' <ATTR_DECL>: element=', $_.element; 28: println ' name =', $_.name; 29: println ' type =', $_.type; 30: println ' default=', $_.default; 31: println ' value =', $_.value; 32: :ENTITY_DECL: println ' <ENTITY_DECL>: name =', $_.name; 33: println ' value =', $_.value; 34: :EXT_ENTITY_DECL:. '<EXT_ENTITY_DECL>: name =', $_.name; 35: println ' pubID =', $_.publicID; 36: println ' sysID =', $_.systemID; 37: :SKIPPED_ENTITY: println ' <SKIPPED_ENTITY>: ', $_; 38: }

As the program shows, events may have no parameters, a string parameter, or a struct parameter with defined members, which are all listed here. :START_DOC and :END_DOC are SAX events; :BEFORE and :AFTER are JudoScript events which happen outside START_DOC and after END_DOC.

Review Questions

How to process an XML file on the hard drive or on the internet using do..as xml statement?
How to use namespaces with do..as xml statement? How to make sure the tag prefixes in the code and in the data are using the same namespace? Why need to ensure this?
Write a program to count one particular tag and all the tags in an XML document.
For an opening tag, how to get all its attributes?
For a text tag, can you get the attributes in the opening tag?
:PI stands for the "processing instruction" event. What are the values for this event (in $_)? What is the value for the :SKIPPED_ENTITY event?

»»» Top «««

2. DOM Processing

XML data can be read in or written out using a DOM object, which contains a tree of nodes representing the data. The key to DOM is two Java interfaces: org.w3c.dom.Document and org.w3c.dom.Node.

To process XML data with DOM, use do..as dom. It looks similar to do..as xml, with two differences: a) it has no body, and b) it is an expression that returns an instance of org.w3c.dom.Document.

Listing 4. read_xml.judo
1: if #args.length == 0 { 2: println <err> 'Usage: java judo ', #prog, ' file.xml'; 3: exit 0; 4: } 5: 6: doc = do #args[0] as dom; // with namespace, validate 7: xslt copy doc to getOut();

In this program, line 6 reads in the file or URL as a DOM, and line 7 xslt copy to write it out to system output. The document itself is a tree of org.w3c.dom.Node's of various kinds. The following GUI program visually displays the tree.

Listing 5. XML2JTree.judo
1: !JavaGuiClass #JTree, #JFrame, #DefaultMutableTreeNode, #DefaultTreeModel 2: !JavaGuiClass #JScrollPane, #Dimension, #JPanel, #TreeSelectionModel 3: !JavaGuiClass #BorderLayout, #Toolkit, #Color, #JOptionPane 4: !JavaBaseClass #Node, #Document, #DocumentBuilderFactory 5: 6: 7: if #args.length < 1 { 8: println <err> 'Usage: java judo ', #prog, ' filename.xml'; 9: exit 0; 10: } 11: 12: const #FRAME_WIDTH = 440; 13: const #FRAME_HEIGHT = 280; 14: 15: showDetails = true; 16: filename = #args[0]; 17: 18: frame = javanew #JFrame("XML to JTree"); 19: frame.setBackground(#Color.lightGray); 20: frame.getContentPane().setLayout(javanew #BorderLayout); 21: { 22: local toolkit = #Toolkit.getDefaultToolkit(); 23: local dim = toolkit.getScreenSize(); 24: local screenHeight = dim.height; 25: local screenWidth = dim.width; 26: 27: // Display in the middle of the screen 28: frame.setBounds( (screenWidth-#FRAME_WIDTH)/2, 29: (screenHeight-#FRAME_HEIGHT)/2, 30: #FRAME_WIDTH, #FRAME_HEIGHT ); 31: } 32: guiEvents { 33: <frame : Window : windowClosing> : exit(0); 34: } 35: 36: doc = null; // global scope 37: { 38: doc = do filename as dom; 39: catch: 40: #JOptionPane.showMessageDialog(frame, $_.message, "Exception", 41: #JOptionPane.WARNING_MESSAGE); 42: exit 0; 43: } 44: 45: top = createTreeNode(doc.getDocumentElement(), showDetails ); 46: dtModel = javanew #DefaultTreeModel(top); 47: jTree = javanew #JTree(dtModel); 48: 49: jTree.getSelectionModel().setSelectionMode( 50: #TreeSelectionModel.SINGLE_TREE_SELECTION); 51: jTree.setShowsRootHandles(true); 52: jTree.setEditable(false); 53: 54: // Create a new JScrollPane to override one of the methods. 55: !JavaClass 56: import java.awt.Dimension; 57: import javax.swing.JTree; 58: 59: public class XmlTreePane extends javax.swing.JScrollPane 60: { 61: int width, height; 62: public XmlTreePane(JTree tree, int width, int height) { 63: super(tree); 64: this.width = width; 65: this.height = height; 66: } 67: public Dimension getPreferredSize() { 68: return new Dimension( width-20, height-40 ); 69: } 70: }; 71: 72: main = javanew #JPanel; 73: jScroll = javanew XmlTreePane(jTree,#FRAME_WIDTH,#FRAME_HEIGHT); 74: 75: panel = javanew #JPanel; 76: panel.setLayout(javanew #BorderLayout); 77: panel.add("Center", jScroll); 78: main.add("Center", panel); 79: frame.getContentPane().add(main, #BorderLayout.CENTER); 80: frame.validate(); 81: frame.setVisible(true); 82: 83: function createTreeNode root, showDetails 84: { 85: type = getNodeType(root); 86: name = root.getNodeName(); 87: value = root.getNodeValue(); 88: 89: if showDetails { 90: dmtNode = javanew #DefaultMutableTreeNode( 91: "[" @ type @ "] --> " @ name@" = " @ value); 92: } else { // Special case for TEXT_NODE 93: dmtNode = javanew #DefaultMutableTreeNode( 94: root.getNodeType()==#Node.TEXT_NODE ? value : name ); 95: } 96: 97: // Display the attributes if there are any 98: attribs = root.getAttributes(); 99: if attribs != null && showDetails { 100: for i from 0 to attribs.getLength()-1 { 101: local attNode = attribs.item(i); 102: local attName = attNode.getNodeName().trim(); 103: local attValue = attNode.getNodeValue().trim(); 104: 105: if attValue.isNotEmpty() && attValue.isNotEmpty() { 106: dmtNode.add(javanew #DefaultMutableTreeNode( 107: "[Attribute] --> " @ attName @ "=\"" @ attValue @ "\"") ); 108: } 109: } 110: } 111: 112: // If there are any children and they are non-null then recurse... 113: if root.hasChildNodes() { 114: childNodes = root.getChildNodes(); 115: if childNodes != null { 116: for k from 0 to childNodes.getLength()-1 { 117: local nd = childNodes.item(k); 118: if nd != null { 119: // A special case could be made for each Node type. 120: if nd.getNodeType() == #Node.ELEMENT_NODE { 121: dmtNode.add(createTreeNode(nd, showDetails)); 122: } 123: elif nd.getNodeValue().isNotEmpty() { // the default 124: dmtNode.add(createTreeNode(nd, showDetails)); 125: } 126: } 127: } 128: } 129: } 130: 131: return dmtNode; 132: } 133: 134: function getNodeType node 135: { 136: switch node.getNodeType() { 137: case #Node.ELEMENT_NODE: return "Element"; 138: case #Node.ATTRIBUTE_NODE: return "Attribute"; 139: case #Node.TEXT_NODE: return "Text"; 140: case #Node.CDATA_SECTION_NODE: return "CData section"; 141: case #Node.ENTITY_REFERENCE_NODE: return "Entity reference"; 142: case #Node.ENTITY_NODE: return "Entity"; 143: case #Node.PROCESSING_INSTRUCTION_NODE:return "Processing instruction"; 144: case #Node.COMMENT_NODE: return "Comment"; 145: case #Node.DOCUMENT_NODE: return "Document"; 146: case #Node.DOCUMENT_TYPE_NODE: return "Document type"; 147: case #Node.DOCUMENT_FRAGMENT_NODE: return "Document fragment"; 148: case #Node.NOTATION_NODE: return "Notation"; 149: default: return "Unknown"; 150: } 151: }

As true for any GUI programs, the majority of the code above is dedicated to building the GUI. The only interesting parts (as far as DOM is concerned) lie between lines 36 and 47 and the function createTreeNode() between lines 83 and 132. That function takes an open document, traverses the whole node tree and builds a DefaultTreeModel that is displayable in a Swing JTree component. You can customize this to display more detailed information about each specific node type.

Namespaces

Like SAX, to enable namespace during reading a DOM, specify these flags using the with clause.

Listing 6. dom_namespace.judo
1: doc = do $$local as dom with namespace; 2: 3: ns_ex = "judoscript/xml/dom_namespace"; 4: $local = "a"; 5: ns_c24 = "http://www.c24solutions.com"; 6: 7: // Get a list of Nodes by TagName Namespace 8: println "Elements in the '", ns_ex, "' namespace..."; 9: nodelist = doc.getElementsByTagNameNS(ns_ex,"*"); 10: for i from 0 to nodelist.getLength()-1 { 11: var n = nodelist.item(i); 12: println n.getNodeName(); 13: } 14: 15: // Use the "local name" 16: println nl, "Elements with a local name of '", $local, "'..."; 17: nodelist = doc.getElementsByTagNameNS("*",$local); 18: for i from 0 to nodelist.getLength()-1 { 19: var n = nodelist.item(i); 20: println n.getNodeName(); 21: } 22: 23: // Get all nodes and look for specified Attributes... 24: println nl, "Attributes in the ", ns_c24, " namespace..."; 25: nodelist = doc.getElementsByTagName("*"); 26: for i from 0 to nodelist.getLength()-1 { 27: if nodelist.item(i).instanceof(#Element) { 28: // Save the text part 29: var t = nodelist.item(i).getFirstChild(); 30: 31: // Search for particular attributes, no wildcards here! 32: var e = nodelist.item(i); 33: var a = e.getAttributeNodeNS(ns_c24,"class"); 34: 35: if a != null { // a is the attribute 36: var val = a.getNodeValue(); 37: println "<", val, ">", t.getNodeValue(), "</", val, ">"; 38: } 39: } 40: } 41: 42: EndScript ----------------------------------------------------- 43: <?xml version='1.0' encoding='utf-8'?> 44: 45: <DOMExample> 46: 47: <section xmlns="judoscript/xml/dom_namespace"> 48: <title price="$49.95">XML with JudoScript</title> 49: <chapter title="DOM Programming"> 50: <author title="Mr." name="James Huang"/> 51: </chapter> 52: </section> 53: 54: <order xmlns:html="http://www.c24solutions.com"> 55: <name html:class="H1">Vince Muller</name> 56: <payment type="credit" html:class="H3">Paid</payment> 57: <html:a href="/jsp/prebookings?order-ref=0527658">Check order</html:a> 58: <date location="London" html:class="H3">2002-02-22</date> 59: </order> 60: 61: </DOMExample>

This program builds a DOM using namespaces for the XML data attached at the end of the script, and conducts three tests that selects nodes based on their namespaces or names.

Create XML from DOM

To create an XML file, you can simply write to a text file. Or you can build a DOM tree and write to the file. To do so, use system function createDom() to create a DOM, then create and attach the nodes before finally write out with xslt copy statement.

Listing 7. create_president.judo
1: doc = createDom(); 2: 3: // Start with a "<Person>" 4: person = doc.createElement("Person"); 5: 6: // Create the "<FirstName>" element 7: firstName = doc.createElement("FirstName"); 8: 9: // Create a Text node "Al" and add it to the "FirstName" tag 10: firstName.appendChild( doc.createTextNode("Al") ); 11: 12: // Add the "<FirstName>" tag to "<Person>" 13: person.appendChild(firstName); 14: 15: // Same as above 16: surname = doc.createElement("Surname"); 17: surname.appendChild( doc.createTextNode("Gore") ); 18: person.appendChild(surname); 19: 20: president = doc.createElement("President"); 21: 22: // Set the "Country" attribute in "<Presedent>" 23: president.setAttribute("Country","Us"); 24: president.appendChild( person ); 25: 26: // Add everything to the XmlDocument (doc) 27: doc.appendChild( president ); 28: 29: // Write the DOM to stdout. 30: xslt copy doc to getOut();

See anything wrong with this program?

List and Navigate DOM

To traverse a DOM in Java, use the facilities defined in package org.w3c.dom.traversal. Only when the document object implements org.w3c.dom.traversal.DocumentTraversal can it be traversed using these facilities. As of this writing, it seems that "xerces" parser has the better support for this feature. This interface has two methods, createNodeIterator() and createTreeWalker(), that creates a node iterator or tree walker for this DOM. Both of them take a org.w3c.dom.traversal.NodeFilter interface to qualify the nodes for the traverse. In JudoScript, these methods have been extended to take a JudoScript function variable for node filtering, as demonstrated in the following program on line 18 and 31. If you have a valid NodeFilter implementation, you can still pass it to these methods.

Listing 8. dom_traverse.judo
1: !JavaBaseClass #NodeFilter 2: 3: doc = do $$local as dom; 4: 5: if ! doc.isSupported("Traversal","2.0") { 6: println <err> 'Traversal is not implemented by this XML parser.'; 7: exit 0; 8: } 9: 10: /* 11: * Use NodeIterator. 12: */ 13: println 'Members of the royal family with children...', nl; 14: filter = lambda n { 15: return n.hasChildNodes() && n.getNodeName()=="Person" 16: ? #NodeFilter.FILTER_ACCEPT : #NodeFilter.FILTER_SKIP; 17: }; 18: iter = doc.createNodeIterator( doc, #NodeFilter.SHOW_ALL, filter, true ); 19: while (n=iter.nextNode()) != null { 20: println n.getAttribute("name"), " (", n.getAttribute("born"), ")"; 21: } 22: 23: /* 24: * Use TreeWalker 25: */ 26: println nl, 'Looking for Princess Anne...'; 27: filter = lambda n { 28: return n.getNodeName()=="Person" 29: ? #NodeFilter.FILTER_ACCEPT : #NodeFilter.FILTER_SKIP; 30: }; 31: walker = doc.createTreeWalker( doc,#NodeFilter.SHOW_ALL,filter,true ); 32: while (n=walker.nextNode()) != null { 33: name = n.getAttribute("name"); 34: if name.indexOf("Anne") >= 0 { break; } 35: println 'Skipping ', name; 36: } 37: 38: // Store the Node so we can come back 39: anne = walker.getCurrentNode(); 40: println 'Found "', anne.getAttribute("name"), '".', nl; 41: 42: walker.setCurrentNode(anne); 43: println 'PreviousSibling = "', walker.previousSibling().getAttribute('name'), '"'; 44: walker.setCurrentNode(anne); 45: println ' NextSibling = "', walker.nextSibling().getAttribute('name'), '"'; 46: walker.setCurrentNode(anne); 47: println ' firstChild = "', walker.firstChild().getAttribute('name'), '"'; 48: walker.setCurrentNode(anne); 49: println ' LastChild = "', walker.lastChild().getAttribute('name'), '"'; 50: walker.setCurrentNode(anne); 51: println ' PreviousNode = "', walker.previousNode().getAttribute('name'), '"'; 52: walker.setCurrentNode(anne); 53: println ' NextNode = "', walker.nextNode().getAttribute('name'), '"'; 54: 55: EndScript ---------------------------------------------------------- 56: <FamilyTree name="The Royal Family"> 57: <Person born="1926" name="Queen Elizabeth II" spouse="Phillip"> 58: <Person born="1948" name="Charles, Prince of Wales" spouse="Diana"> 59: <Person born="1982" name="Prince William"/> 60: <Person born="1984" name="Prince Henry of Wales"/> 61: </Person> 62: <Person born="1950" name="Anne, Princess Royal" 63: spouse="Mark" spouse2="Tim"> 64: <Person born="1977" name="Peter Phillips"/> 65: <Person born="1981" name="Zara Phillips"/> 66: </Person> 67: <Person born="1960" name="Andrew, Duke of York" spouse="Sarah"> 68: <Person born="1988" name="Princess Beatrice of York"/> 69: <Person born="1990" name="Princess Eugenie or York"/> 70: </Person> 71: <Person born="1964" name="Edward, Earl of Wessex" 72: spouse="Sophie"/> 73: </Person> 74: </FamilyTree>

This is the result of the run:

Members of the royal family with children...

Queen Elizabeth II (1926)
Charles, Prince of Wales (1948)
Anne, Princess Royal (1950)
Andrew, Duke of York (1960)

Looking for Princess Anne...
Skipping Queen Elizabeth II
Skipping Charles, Prince of Wales
Skipping Prince William
Skipping Prince Henry of Wales
Found "Anne, Princess Royal".

PreviousSibling = "Charles, Prince of Wales"
    NextSibling = "Andrew, Duke of York"
     firstChild = "Peter Phillips"
      LastChild = "Zara Phillips"
   PreviousNode = "Prince Henry of Wales"
       NextNode = "Peter Phillips"

»»» Top «««

3. XSLT

Applying XSLT is easy in JudoScript with the xslt statement. In addition to doing XSL transformation, it also writes out XML documents and selects parts of documents with XPath expressions. This is how to apply an XSLT:

Listing 9. xslt.judo
1: xslt 'calls.xsl' on 'calls.xml' to 'out_calls.html';

File "calls.xml" is:

<?xml version="1.0"?>
<!DOCTYPE PHONE_RECORDS SYSTEM "calls.dtd">

<PHONE_RECORDS>

  <!-- Call Record 1 -->
  <CALL>
    <FROM>703-433-5678</FROM>
    <DATE>5/5/2000</DATE>
    <TIME HOUR="19" MINUTE="32"/>
    <DESTINATION STATE="California" CITY="Sunnyvale"
     COUNTRY="US">510-798-8390</DESTINATION>
    <DURATION HOURS="1" MINUTES="15"/>
  </CALL>

  <!-- Call Record 2 -->
  <CALL>
    <FROM>703-373-2318</FROM>
    <DATE>5/15/2000</DATE>
    <TIME HOUR="20" MINUTE="12"/>
    <DESTINATION CITY="Birmingham"
     COUNTRY="UK">44-121-738-4294</DESTINATION>
    <DURATION HOURS="0" MINUTES="34"/>
  </CALL>

</PHONE_RECORDS>

This is the DTD file:

<!ELEMENT PHONE_RECORDS ( CALL )* >
<!ELEMENT CALL ( FROM, DATE, TIME, DESTINATION, DURATION, CALL_PROMOTION? ) >
<!ELEMENT FROM ( #PCDATA ) >
<!ELEMENT DATE ( #PCDATA ) >
<!ELEMENT TIME EMPTY >
<!ATTLIST TIME
  HOUR NMTOKEN #REQUIRED
  MINUTE NMTOKEN #REQUIRED
>
<!ELEMENT DESTINATION ( #PCDATA ) >
<!ATTLIST DESTINATION
  CITY NMTOKEN #REQUIRED
  COUNTRY NMTOKEN #REQUIRED
  STATE NMTOKEN #IMPLIED
>
<!ELEMENT DURATION ( #PCDATA ) >
<!ATTLIST DURATION
  HOURS NMTOKEN #REQUIRED
  MINUTES NMTOKEN #REQUIRED
>

And "calls.xsl" is:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match="PHONE_RECORDS">
  <html><head><title>Phone Listing</title></head>
  <body><h1>Phone Call Records</h1>
  <table border="1">
    <th>Item</th>
    <th>Source Number</th>
    <th>Destination Number</th>
    <th>Date (MM/DD/YY)</th>

  <xsl:apply-templates/>

  </table>
  </body></html>
</xsl:template>

<xsl:template match="CALL">
  <tr>
  <td><xsl:number/></td>
  <td><xsl:value-of select="FROM"/></td>
  <td><xsl:value-of select="DESTINATION"/></td>
  <td><xsl:value-of select="DATE"/></td>
  </tr>
</xsl:template>

</xsl:stylesheet>

The input XML file and the XSL file can be any valid XML source, such as files, URLs, input streams, or DOM documents. If the input is not a file or URL, you may want to specify its system ID so that referenced documents such as DTDs can be resolved by the parser:

Listing 10. xslt_local_src.judo
1: xslt 'calls.xsl' on $$local systemID 'calls.xml' to 'out_calls.html'; 2: 3: EndScript ------------------ 4: 5: <?xml version="1.0"?> 6: <!DOCTYPE PHONE_RECORDS SYSTEM "calls.dtd"> 7: 8: <PHONE_RECORDS> 9: 10:  11: <CALL> 12: <FROM>703-433-5678</FROM> 13: <DATE>5/5/2000</DATE> 14: <TIME HOUR="19" MINUTE="32"/> 15: <DESTINATION STATE="California" CITY="Sunnyvale" 16: COUNTRY="US">510-798-8390</DESTINATION> 17: <DURATION HOURS="1" MINUTES="15"/> 18: </CALL> 19: 20:  21: <CALL> 22: <FROM>703-373-2318</FROM> 23: <DATE>5/15/2000</DATE> 24: <TIME HOUR="20" MINUTE="12"/> 25: <DESTINATION CITY="Birmingham" 26: COUNTRY="UK">44-121-738-4294</DESTINATION> 27: <DURATION HOURS="0" MINUTES="34"/> 28: </CALL> 29: 30: </PHONE_RECORDS>

The XSL or XML data can also be text within the script. Since a string is interpreted as a file name or URL, you must call the string's getReader() to turn the string into a Reader:

Listing 11. xslt_text_src.judo
1: data = [[* 2: <?xml version="1.0"?> 3: <!DOCTYPE PHONE_RECORDS SYSTEM "calls.dtd"> 4: 5: <PHONE_RECORDS> 6: 7:  8: <CALL> 9: <FROM>703-433-5678</FROM> 10: <DATE>5/5/2000</DATE> 11: <TIME HOUR="19" MINUTE="32"/> 12: <DESTINATION STATE="California" CITY="Sunnyvale" 13: COUNTRY="US">510-798-8390</DESTINATION> 14: <DURATION HOURS="1" MINUTES="15"/> 15: </CALL> 16: 17:  18: <CALL> 19: <FROM>703-373-2318</FROM> 20: <DATE>5/15/2000</DATE> 21: <TIME HOUR="20" MINUTE="12"/> 22: <DESTINATION CITY="Birmingham" 23: COUNTRY="UK">44-121-738-4294</DESTINATION> 24: <DURATION HOURS="0" MINUTES="34"/> 25: </CALL> 26: 27: </PHONE_RECORDS> 28: *]]; 29: 30: xslt 'calls.xsl' on data.getReader() systemID 'calls.xml' to 'out_calls.html';

The XSL or XML data can be DOM documents. You can also output the resultant tree as a DOM node.

Listing 12. xslt_from_dom.judo
1: doc = do 'calls.xml' as dom; 2: xslt 'calls.xsl' on doc to 'out_calls1.html'; 3: 4: doc = do 'calls.xsl' as dom; 5: xslt doc on 'calls.xml' to 'out_calls2.html'; 6: 7: xslt doc on 'calls.xml' as dom; 8: xslt copy $_ to 'out_calls3.html';

On line 2, we do a simple XSLT but take a DOM document as the XML data source. On line 5, a DOM document is used for the XSL. Line 7 does the same as line 2 except it produces the result in a DOM node. Line 8 shows how to copy a DOM.

The following example shows how to a query using XPath expressions in JudoScript.

Listing 13. xslt_query.judo
1: xslt xpath('/PHONE_RECORDS/CALL[1]/DESTINATION') on 'calls.xml' to 'result.xml' 2: outputProperties( 'omit-xml-declaration' = 'yes' );

The xpath tells xslt to interpret the text to be an XPath expression and query the XML data to produce a list of nodes. In this example we also used the outputProperties. xslt takes the output property names as defined in javax.xml.transform.OutputKey, which are:

cdata-section-elements
doctype-public
doctype-system
encoding
indent
media-type
method
omit-xml-declaration
standalone
version

where boolean options take "yes" or "no" as values.

To pass parameter values to the XSL, use the parameters clause like this:

Listing 14. xslt_param.judo
1: xslt 'calls_param.xsl' on 'calls.xml' to getOut() 2: parameters( pageTitle = 'Welcome to Online Phone Listings' );

First of all, note the output is written to the system output of the language engine, usually the same as System.out. The XSL takes a parameter:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:param name="pageTitle"/>
<xsl:template match="PHONE_RECORDS">
  <html><head><title><xsl:value-of select="$pageTitle"/></title></head>
  <body><h1><xsl:value-of select="$pageTitle"/></h1>
  <table border="1">
    <th>Item</th>
    <th>Source Number</th>
    <th>Destination Number</th>
    <th>Date (MM/DD/YY)</th>

  <xsl:apply-templates/>

  </table>
  </body></html>
</xsl:template>

<xsl:template match="CALL">
  <tr>
  <td><xsl:number/></td>
  <td><xsl:value-of select="FROM"/></td>
  <td><xsl:value-of select="DESTINATION"/></td>
  <td><xsl:value-of select="DATE"/></td>
  </tr>
</xsl:template>

</xsl:stylesheet>

»»» Top «««

4. Java XML Parsers

Currently there are two most popular Java XML parsers, "crimson" and "xerces". Sun's JAXP1.1 reference implementation includes and uses "crimson" parser in the file "crimson.jar". We found "xerces" to be more comfortable to use for its better support of the latest features. "Xerces" is recommended at this point.

To use "xerces" with JDK1.3, download the package from http://www.apache.org, unpack, and put "xerces.jar" into the classpath; make sure it is before any class libraries from Sun, such as "j2ee.jar", etc.

»»» Top «««

5. Summary

JudoScript supports processing XML data in SAX- or DOM-style, with options of namespace, validation and other features. SAX programming is done through the do..as xml event-driven statement; it not only supports all the SAX 2.0 events, but also has a "text" tag feature that allows text enclosed between a pair tags be processed, part of which is the attributes of the opening tag. If a tag has mixed content, its text can be obtained by either copying all the embedded tags or ignoring them. Each event is specified as a label, where handler code follows. In the code, the tag itself is represented by the built-in variable $_, which contains attributes that can be accessed as struct members. For specific events (such as PI, ENTITY_DECL, etc.), the members are predefined. Annonymous tags and unnamed text can be handled by these special labels: <> and TEXT.

XML data can be read into a DOM by the do..as dom expression, which returns an org.w3c.dom.Document object. You can process the DOM with all the Java DOM APIs, such as searching and traversals. When using Java DOM traversal API, in addition to org.w3c.dom.traversal.NodeFilter, JudoScript function variables can be used for node filters by the createNodeIterator() and createTreeWalker() methods in DocumentTraversal interface.

To create XML files via DOMs, use system function createDom() to create an empty DOM, and create a node tree by calling its methods to create and attach various kinds of nodes to each other.

XSL transformation is done with the xslt statement. It takes the XSL and an XML data source which can be files, URLs, input streams or readers or DOMs, and produce the output to files, output streams or writers or other DOMs. It can copy the XML document to the output, useful for writing DOM objects, and can do queries using XPath expressions. You can optionally set output properties and XSL parameter values.

Currently there are two popular Java XML parsers, "crimson" and "xerces". Sun's JAXP1.1 reference implementation includes and uses "crimson" parser in the file "crimson.jar". We found "xerces" to be more comfortable to use for its better support of the latest features. "Xerces" is recommended at this point. For XSLT, Apache's Xalan has been used and tested with.

»»» Top «««


Copyright © 2001-2005 JudoScript.COM. All Rights Reserved.