This article is old and is being consolidated into
the book.
Please refer to the corresponding chapter(s) therein. If the chapters or
sections are not completed yet, you can use this article. Refer to the
examples as they are tested against the latest code.
Abstract
XML scripting is one of the main motivations for JudoScript itself. JudoScript supports
SAX and DOM programming and XSL transformation. In the do..as xml
event-driven statement, SAX events, including tags, are denoted as labels,
followed by handler code. JudoScript extends SAX with a text tag, which is
a compound tag that includes the opening, closing tags and the text enclosed
within. The opening tag's attributes are accessible, too. If a tag has mixed
content, you have the option to copy or ignore all the embedded tags. For
each event, the built-in variable $_ represents the tag or the
event. XML data can be read into a DOM by the do..as dom expression,
which returns an org.w3c.dom.Document object. You can use a
function variable as a node filter when using the Java DOM traversal API. To
create a DOM, use system functions createDom(). The xslt
statement applies XSL transformations and also copies XML documents and
outputs files or DOMs.
SAX is, indeed, a simple API for XML programming. Each tag, including text,
becomes an event that application code may choose to respond. JudoScript goes one
step further by supporting the text enclosed in between specific tags. This
saves you the trouble to implement state machines just to retrieve text
content of tags.
Briefly on logistics. On line 1, $$local represents an input text
stream for the data enclosed in this script after the EndScript
marker, starting on the following line. The do..as xml statement
can take input streams, readers, files and URLs. It takes some options;
in this example we just accept its defaults.
Let us discuss regular XML tags first. In their handlers, $_
represents the current tag. It is a read-only struct, with its members
being the tag attributes; see line 3. Its toString() method,
which implicitly called by print statement, reproduces the tag itself. To
enumerate its attributes, though, it is different. It uses these methods:
countAttrs(), getAttrName() and getAttrValue().
<some_tag>: for x from 0 to $_.countAttrs() {
println $_.getAttrName(x), ' => ', $_.getAttrValue(x);
}
The closing tags can be handled the same way, except they do not have
attributes.
The named text tags, like lines 5 though 8 above, are represented by
$_, which contains all the attributes for the opening tag.
But its toString() returns the text, not the tags. The opening
and closing tags can be handled separately; if the closing tag is being
handled, it happens after its text handler is called. So this is the
way to reproduce a text tag:
In XML data, text content may be enclosed within tags along with other
tags; the parent tag, in this case, is referred to as having mixed
content. So how to get the text of a mixed tag? The JudoScript SAX-style
processing allows you to either copy the embedded tags (which is the
default), or ignore them. That is it. If more detailed, accurate
handling is needed, switch to DOM. To ignore the embedded tags,
specify a minus sing - before :. To explicitly
specify copying the embedded tags, put a plus sing + before
:.
<some_tag>TEXT-: println $_;
Annonymous tags, both opening and closing ones, can be handled by
<>; unnamed text pieces can be handled by just :TEXT.
<>: println $_;
:TEXT: println $_;
SAX also reports other events as shall we see later, and JudoScript addes two
more: BEFORE and AFTER events.
The options for do..as xml are passed on to the XML parser.
At this stage, these two flags are used most often: namespace
and validate. They are by default false.
1: do $$local as xml with namespace, validate,
2: xmlns:n="judoscript/xml/namespaces"
3: {
4: <n:article>: println ' Date >> ', $_.date;
5: TEXT<n:headline>: println 'Headline >> ', $_;
6: TEXT<n:author>: println ' Author >> ', $_;
7: TEXT<n:body>: println ' Body >> ', $_;
8:
9: ERROR: println ' <ERROR>: ', $_.getMessage();
10: }
11:
12: EndScript ------------------------------------------------------
13: <?xml version="1.0"?>
14: <!DOCTYPE news:article [
15: <!ELEMENT news:article (news:headline, news:author, news:body)>
16: <!ATTLIST news:article date CDATA #REQUIRED>
17: <!ELEMENT news:headline (#PCDATA)>
18: <!ELEMENT news:author (#PCDATA)>
19: <!ELEMENT news:body (#PCDATA)>
20: ]>
21: <news:article xmlns:news="judoscript/xml/namespaces" date="02-Dec-2000">
22: <news:headline>SAX 2.0 Released</news:headline>
23: <news:author>F. Bar</news:author>
24: <news:body>SAX 2.0 has been released into the public domain</news:body>
25: </news:article>
On line 2, the xmlns:n=... associates the prefix "n" to
a URI; any appearances of this prefix in the handler labels (lines 4
though 7) are using this namespace. Note that in the data, the prefix
is different ("news") but they both represent the same namespace, hence
the tags match.
Line 9 shows one of the events other than tags and text: ERROR.
This is a recoverable error as opposed to fatal error; parsers may call
this and later recover and fix the problem. Using the "crimson" parser,
for example, this handler is called for data on line 21 for "news:article"
that its prefix is not defined, even though in the same tag the namespace
is being defined. The "xerces" parser does not have this issue.
SAX 2.0 has defined some more events that applications can handle. The
following program demonstrates all the SAX 2.0 and JudoScript-defined events
you can handle in do..as xml.
As the program shows, events may have no parameters, a string parameter,
or a struct parameter with defined members, which are all listed here.
:START_DOC and :END_DOC are SAX events; :BEFORE
and :AFTER are JudoScript events which happen outside START_DOC and
after END_DOC.
Review Questions
How to process an XML file on the hard drive or on the internet
using do..as xml statement?
How to use namespaces with do..as xml statement?
How to make sure the tag prefixes in the code and in the data
are using the same namespace? Why need to ensure this?
Write a program to count one particular tag and all the tags
in an XML document.
For an opening tag, how to get all its attributes?
For a text tag, can you get the attributes in the opening tag?
:PI stands for the "processing instruction" event.
What are the values for this event (in $_)?
What is the value for the :SKIPPED_ENTITY event?
XML data can be read in or written out using a DOM object, which contains
a tree of nodes representing the data. The key to DOM is two Java interfaces:
org.w3c.dom.Document and org.w3c.dom.Node.
To process XML data with DOM, use do..as dom. It looks similar to
do..as xml, with two differences: a) it has no body, and b) it is
an expression that returns an instance of org.w3c.dom.Document.
1: if #args.length == 0 {
2: println <err> 'Usage: java judo ', #prog, ' file.xml';
3: exit 0;
4: }
5:
6: doc = do #args[0] as dom; // with namespace, validate
7: xslt copy doc to getOut();
In this program, line 6 reads in the file or URL as a DOM, and line 7
xslt copy to write it out to system output. The document itself
is a tree of org.w3c.dom.Node's of various kinds. The
following GUI program visually displays the tree.
1: !JavaGuiClass #JTree, #JFrame, #DefaultMutableTreeNode, #DefaultTreeModel
2: !JavaGuiClass #JScrollPane, #Dimension, #JPanel, #TreeSelectionModel
3: !JavaGuiClass #BorderLayout, #Toolkit, #Color, #JOptionPane
4: !JavaBaseClass #Node, #Document, #DocumentBuilderFactory
5:
6:
7: if #args.length < 1 {
8: println <err> 'Usage: java judo ', #prog, ' filename.xml';
9: exit 0;
10: }
11:
12: const #FRAME_WIDTH = 440;
13: const #FRAME_HEIGHT = 280;
14:
15: showDetails = true;
16: filename = #args[0];
17:
18: frame = javanew #JFrame("XML to JTree");
19: frame.setBackground(#Color.lightGray);
20: frame.getContentPane().setLayout(javanew #BorderLayout);
21: {
22: local toolkit = #Toolkit.getDefaultToolkit();
23: local dim = toolkit.getScreenSize();
24: local screenHeight = dim.height;
25: local screenWidth = dim.width;
26:
27: // Display in the middle of the screen
28: frame.setBounds( (screenWidth-#FRAME_WIDTH)/2,
29: (screenHeight-#FRAME_HEIGHT)/2,
30: #FRAME_WIDTH, #FRAME_HEIGHT );
31: }
32: guiEvents {
33: <frame : Window : windowClosing> : exit(0);
34: }
35:
36: doc = null; // global scope
37: {
38: doc = do filename as dom;
39: catch:
40: #JOptionPane.showMessageDialog(frame, $_.message, "Exception",
41: #JOptionPane.WARNING_MESSAGE);
42: exit 0;
43: }
44:
45: top = createTreeNode(doc.getDocumentElement(), showDetails );
46: dtModel = javanew #DefaultTreeModel(top);
47: jTree = javanew #JTree(dtModel);
48:
49: jTree.getSelectionModel().setSelectionMode(
50: #TreeSelectionModel.SINGLE_TREE_SELECTION);
51: jTree.setShowsRootHandles(true);
52: jTree.setEditable(false);
53:
54: // Create a new JScrollPane to override one of the methods.
55: !JavaClass
56: import java.awt.Dimension;
57: import javax.swing.JTree;
58:
59: public class XmlTreePane extends javax.swing.JScrollPane
60: {
61: int width, height;
62: public XmlTreePane(JTree tree, int width, int height) {
63: super(tree);
64: this.width = width;
65: this.height = height;
66: }
67: public Dimension getPreferredSize() {
68: return new Dimension( width-20, height-40 );
69: }
70: };
71:
72: main = javanew #JPanel;
73: jScroll = javanew XmlTreePane(jTree,#FRAME_WIDTH,#FRAME_HEIGHT);
74:
75: panel = javanew #JPanel;
76: panel.setLayout(javanew #BorderLayout);
77: panel.add("Center", jScroll);
78: main.add("Center", panel);
79: frame.getContentPane().add(main, #BorderLayout.CENTER);
80: frame.validate();
81: frame.setVisible(true);
82:
83: function createTreeNode root, showDetails
84: {
85: type = getNodeType(root);
86: name = root.getNodeName();
87: value = root.getNodeValue();
88:
89: if showDetails {
90: dmtNode = javanew #DefaultMutableTreeNode(
91: "[" @ type @ "] --> " @ name@" = " @ value);
92: } else { // Special case for TEXT_NODE
93: dmtNode = javanew #DefaultMutableTreeNode(
94: root.getNodeType()==#Node.TEXT_NODE ? value : name );
95: }
96:
97: // Display the attributes if there are any
98: attribs = root.getAttributes();
99: if attribs != null && showDetails {
100: for i from 0 to attribs.getLength()-1 {
101: local attNode = attribs.item(i);
102: local attName = attNode.getNodeName().trim();
103: local attValue = attNode.getNodeValue().trim();
104:
105: if attValue.isNotEmpty() && attValue.isNotEmpty() {
106: dmtNode.add(javanew #DefaultMutableTreeNode(
107: "[Attribute] --> " @ attName @ "=\"" @ attValue @ "\"") );
108: }
109: }
110: }
111:
112: // If there are any children and they are non-null then recurse...
113: if root.hasChildNodes() {
114: childNodes = root.getChildNodes();
115: if childNodes != null {
116: for k from 0 to childNodes.getLength()-1 {
117: local nd = childNodes.item(k);
118: if nd != null {
119: // A special case could be made for each Node type.
120: if nd.getNodeType() == #Node.ELEMENT_NODE {
121: dmtNode.add(createTreeNode(nd, showDetails));
122: }
123: elif nd.getNodeValue().isNotEmpty() { // the default
124: dmtNode.add(createTreeNode(nd, showDetails));
125: }
126: }
127: }
128: }
129: }
130:
131: return dmtNode;
132: }
133:
134: function getNodeType node
135: {
136: switch node.getNodeType() {
137: case #Node.ELEMENT_NODE: return "Element";
138: case #Node.ATTRIBUTE_NODE: return "Attribute";
139: case #Node.TEXT_NODE: return "Text";
140: case #Node.CDATA_SECTION_NODE: return "CData section";
141: case #Node.ENTITY_REFERENCE_NODE: return "Entity reference";
142: case #Node.ENTITY_NODE: return "Entity";
143: case #Node.PROCESSING_INSTRUCTION_NODE:return "Processing instruction";
144: case #Node.COMMENT_NODE: return "Comment";
145: case #Node.DOCUMENT_NODE: return "Document";
146: case #Node.DOCUMENT_TYPE_NODE: return "Document type";
147: case #Node.DOCUMENT_FRAGMENT_NODE: return "Document fragment";
148: case #Node.NOTATION_NODE: return "Notation";
149: default: return "Unknown";
150: }
151: }
As true for any GUI programs, the majority of the code above is dedicated
to building the GUI. The only interesting parts (as far as DOM is concerned)
lie between lines 36 and 47 and the function createTreeNode()
between lines 83 and 132. That function takes an open document, traverses
the whole node tree and builds a DefaultTreeModel that is displayable in
a Swing JTree component. You can customize this to display more detailed
information about each specific node type.
1: doc = do $$local as dom with namespace;
2:
3: ns_ex = "judoscript/xml/dom_namespace";
4: $local = "a";
5: ns_c24 = "http://www.c24solutions.com";
6:
7: // Get a list of Nodes by TagName Namespace
8: println "Elements in the '", ns_ex, "' namespace...";
9: nodelist = doc.getElementsByTagNameNS(ns_ex,"*");
10: for i from 0 to nodelist.getLength()-1 {
11: var n = nodelist.item(i);
12: println n.getNodeName();
13: }
14:
15: // Use the "local name"
16: println nl, "Elements with a local name of '", $local, "'...";
17: nodelist = doc.getElementsByTagNameNS("*",$local);
18: for i from 0 to nodelist.getLength()-1 {
19: var n = nodelist.item(i);
20: println n.getNodeName();
21: }
22:
23: // Get all nodes and look for specified Attributes...
24: println nl, "Attributes in the ", ns_c24, " namespace...";
25: nodelist = doc.getElementsByTagName("*");
26: for i from 0 to nodelist.getLength()-1 {
27: if nodelist.item(i).instanceof(#Element) {
28: // Save the text part
29: var t = nodelist.item(i).getFirstChild();
30:
31: // Search for particular attributes, no wildcards here!
32: var e = nodelist.item(i);
33: var a = e.getAttributeNodeNS(ns_c24,"class");
34:
35: if a != null { // a is the attribute
36: var val = a.getNodeValue();
37: println "<", val, ">", t.getNodeValue(), "</", val, ">";
38: }
39: }
40: }
41:
42: EndScript -----------------------------------------------------
43: <?xml version='1.0' encoding='utf-8'?>
44:
45: <DOMExample>
46:
47: <section xmlns="judoscript/xml/dom_namespace">
48: <title price="$49.95">XML with JudoScript</title>
49: <chapter title="DOM Programming">
50: <author title="Mr." name="James Huang"/>
51: </chapter>
52: </section>
53:
54: <order xmlns:html="http://www.c24solutions.com">
55: <name html:class="H1">Vince Muller</name>
56: <payment type="credit" html:class="H3">Paid</payment>
57: <html:a href="/jsp/prebookings?order-ref=0527658">Check order</html:a>
58: <date location="London" html:class="H3">2002-02-22</date>
59: </order>
60:
61: </DOMExample>
This program builds a DOM using namespaces for the XML data attached at
the end of the script, and conducts three tests that selects nodes based
on their namespaces or names.
To create an XML file, you can simply write to a text file. Or you can
build a DOM tree and write to the file. To do so, use system function
createDom() to create a DOM, then create and attach the nodes
before finally write out with xslt copy statement.
1: doc = createDom();
2:
3: // Start with a "<Person>"
4: person = doc.createElement("Person");
5:
6: // Create the "<FirstName>" element
7: firstName = doc.createElement("FirstName");
8:
9: // Create a Text node "Al" and add it to the "FirstName" tag
10: firstName.appendChild( doc.createTextNode("Al") );
11:
12: // Add the "<FirstName>" tag to "<Person>"
13: person.appendChild(firstName);
14:
15: // Same as above
16: surname = doc.createElement("Surname");
17: surname.appendChild( doc.createTextNode("Gore") );
18: person.appendChild(surname);
19:
20: president = doc.createElement("President");
21:
22: // Set the "Country" attribute in "<Presedent>"
23: president.setAttribute("Country","Us");
24: president.appendChild( person );
25:
26: // Add everything to the XmlDocument (doc)
27: doc.appendChild( president );
28:
29: // Write the DOM to stdout.
30: xslt copy doc to getOut();
To traverse a DOM in Java, use the facilities defined in package
org.w3c.dom.traversal. Only when the document object implements
org.w3c.dom.traversal.DocumentTraversal can it be traversed
using these facilities. As of this writing, it seems that "xerces" parser has
the better support for this feature. This interface has two methods,
createNodeIterator() and createTreeWalker(), that
creates a node iterator or tree walker for this DOM. Both of them take a
org.w3c.dom.traversal.NodeFilter interface to qualify the nodes
for the traverse. In JudoScript, these methods have been extended to take a JudoScript
function variable for node filtering, as demonstrated in the following
program on line 18 and 31. If you have a valid NodeFilter
implementation, you can still pass it to these methods.
1: !JavaBaseClass #NodeFilter
2:
3: doc = do $$local as dom;
4:
5: if ! doc.isSupported("Traversal","2.0") {
6: println <err> 'Traversal is not implemented by this XML parser.';
7: exit 0;
8: }
9:
10: /*
11: * Use NodeIterator.
12: */
13: println 'Members of the royal family with children...', nl;
14: filter = lambda n {
15: return n.hasChildNodes() && n.getNodeName()=="Person"
16: ? #NodeFilter.FILTER_ACCEPT : #NodeFilter.FILTER_SKIP;
17: };
18: iter = doc.createNodeIterator( doc, #NodeFilter.SHOW_ALL, filter, true );
19: while (n=iter.nextNode()) != null {
20: println n.getAttribute("name"), " (", n.getAttribute("born"), ")";
21: }
22:
23: /*
24: * Use TreeWalker
25: */
26: println nl, 'Looking for Princess Anne...';
27: filter = lambda n {
28: return n.getNodeName()=="Person"
29: ? #NodeFilter.FILTER_ACCEPT : #NodeFilter.FILTER_SKIP;
30: };
31: walker = doc.createTreeWalker( doc,#NodeFilter.SHOW_ALL,filter,true );
32: while (n=walker.nextNode()) != null {
33: name = n.getAttribute("name");
34: if name.indexOf("Anne") >= 0 { break; }
35: println 'Skipping ', name;
36: }
37:
38: // Store the Node so we can come back
39: anne = walker.getCurrentNode();
40: println 'Found "', anne.getAttribute("name"), '".', nl;
41:
42: walker.setCurrentNode(anne);
43: println 'PreviousSibling = "', walker.previousSibling().getAttribute('name'), '"';
44: walker.setCurrentNode(anne);
45: println ' NextSibling = "', walker.nextSibling().getAttribute('name'), '"';
46: walker.setCurrentNode(anne);
47: println ' firstChild = "', walker.firstChild().getAttribute('name'), '"';
48: walker.setCurrentNode(anne);
49: println ' LastChild = "', walker.lastChild().getAttribute('name'), '"';
50: walker.setCurrentNode(anne);
51: println ' PreviousNode = "', walker.previousNode().getAttribute('name'), '"';
52: walker.setCurrentNode(anne);
53: println ' NextNode = "', walker.nextNode().getAttribute('name'), '"';
54:
55: EndScript ----------------------------------------------------------
56: <FamilyTree name="The Royal Family">
57: <Person born="1926" name="Queen Elizabeth II" spouse="Phillip">
58: <Person born="1948" name="Charles, Prince of Wales" spouse="Diana">
59: <Person born="1982" name="Prince William"/>
60: <Person born="1984" name="Prince Henry of Wales"/>
61: </Person>
62: <Person born="1950" name="Anne, Princess Royal"
63: spouse="Mark" spouse2="Tim">
64: <Person born="1977" name="Peter Phillips"/>
65: <Person born="1981" name="Zara Phillips"/>
66: </Person>
67: <Person born="1960" name="Andrew, Duke of York" spouse="Sarah">
68: <Person born="1988" name="Princess Beatrice of York"/>
69: <Person born="1990" name="Princess Eugenie or York"/>
70: </Person>
71: <Person born="1964" name="Edward, Earl of Wessex"
72: spouse="Sophie"/>
73: </Person>
74: </FamilyTree>
This is the result of the run:
Members of the royal family with children...
Queen Elizabeth II (1926)
Charles, Prince of Wales (1948)
Anne, Princess Royal (1950)
Andrew, Duke of York (1960)
Looking for Princess Anne...
Skipping Queen Elizabeth II
Skipping Charles, Prince of Wales
Skipping Prince William
Skipping Prince Henry of Wales
Found "Anne, Princess Royal".
PreviousSibling = "Charles, Prince of Wales"
NextSibling = "Andrew, Duke of York"
firstChild = "Peter Phillips"
LastChild = "Zara Phillips"
PreviousNode = "Prince Henry of Wales"
NextNode = "Peter Phillips"
Applying XSLT is easy in JudoScript with the xslt statement. In addition
to doing XSL transformation, it also writes out XML documents and selects
parts of documents with XPath expressions. This is how to apply an XSLT:
The input XML file and the XSL file can be any valid XML source, such as
files, URLs, input streams, or DOM documents. If the input is not a file
or URL, you may want to specify its system ID so that referenced documents
such as DTDs can be resolved by the parser:
The XSL or XML data can also be text within the script. Since a string is
interpreted as a file name or URL, you must call the string's getReader()
to turn the string into a Reader:
1: doc = do 'calls.xml' as dom;
2: xslt 'calls.xsl' on doc to 'out_calls1.html';
3:
4: doc = do 'calls.xsl' as dom;
5: xslt doc on 'calls.xml' to 'out_calls2.html';
6:
7: xslt doc on 'calls.xml' as dom;
8: xslt copy $_ to 'out_calls3.html';
On line 2, we do a simple XSLT but take a DOM document as the XML data source.
On line 5, a DOM document is used for the XSL. Line 7 does the same as line 2
except it produces the result in a DOM node. Line 8 shows how to copy a DOM.
The following example shows how to a query using XPath expressions in JudoScript.
1: xslt xpath('/PHONE_RECORDS/CALL[1]/DESTINATION') on 'calls.xml' to 'result.xml'
2: outputProperties( 'omit-xml-declaration' = 'yes' );
The xpath tells xslt to interpret the text to be an XPath
expression and query the XML data to produce a list of nodes. In this example
we also used the outputProperties. xslt takes the output
property names as defined in javax.xml.transform.OutputKey, which
are:
cdata-section-elements
doctype-public
doctype-system
encoding
indent
media-type
method
omit-xml-declaration
standalone
version
where boolean options take "yes" or "no" as values.
To pass parameter values to the XSL, use the parameters clause like this:
Currently there are two most popular Java XML parsers, "crimson" and
"xerces". Sun's JAXP1.1 reference implementation includes and uses "crimson"
parser in the file "crimson.jar". We found "xerces" to be more comfortable
to use for its better support of the latest features. "Xerces" is recommended
at this point.
To use "xerces" with JDK1.3, download the package from
http://www.apache.org, unpack, and put
"xerces.jar" into the classpath; make sure it is before any class
libraries from Sun, such as "j2ee.jar", etc.
JudoScript supports processing XML data in SAX- or DOM-style, with options of
namespace, validation and other features. SAX programming is done through
the do..as xml event-driven statement; it not only supports all
the SAX 2.0 events, but also has a "text" tag feature that allows text
enclosed between a pair tags be processed, part of which is the attributes
of the opening tag. If a tag has mixed content, its text can be obtained
by either copying all the embedded tags or ignoring them. Each event is
specified as a label, where handler code follows. In the code, the tag
itself is represented by the built-in variable $_, which contains
attributes that can be accessed as struct members. For specific events
(such as PI, ENTITY_DECL, etc.), the members are
predefined. Annonymous tags and unnamed text can be handled by these
special labels: <> and TEXT.
XML data can be read into a DOM by the do..as dom expression,
which returns an org.w3c.dom.Document object. You can
process the DOM with all the Java DOM APIs, such as searching and
traversals. When using Java DOM traversal API, in addition to
org.w3c.dom.traversal.NodeFilter, JudoScript function variables
can be used for node filters by the createNodeIterator()
and createTreeWalker() methods in DocumentTraversal
interface.
To create XML files via DOMs, use system function createDom()
to create an empty DOM, and create a node tree by calling its methods to
create and attach various kinds of nodes to each other.
XSL transformation is done with the xslt statement. It takes the
XSL and an XML data source which can be files, URLs, input streams or
readers or DOMs, and produce the output to files, output streams or
writers or other DOMs. It can copy the XML document to the output, useful
for writing DOM objects, and can do queries using XPath expressions. You
can optionally set output properties and XSL parameter values.
Currently there are two popular Java XML parsers, "crimson" and "xerces".
Sun's JAXP1.1 reference implementation includes and uses "crimson" parser
in the file "crimson.jar". We found "xerces" to be more comfortable to use
for its better support of the latest features. "Xerces" is recommended at
this point. For XSLT, Apache's Xalan has been used and tested with.