| This article presents how XML documents can be manipulated in Java applications in a strongly typed fashion. There isn’t a clear definition of strongly typed language, but for the scope of this article, when I say strongly typed XML I mean it in the same way Java is a strongly typed language. In other words, there are type checks both at compile-time and runtime. |
Download the author's files associated with this article
|
XMLBeans Status
XMLBeans was developed inside BEA’s WebLogic Workshop team to solve the XML Java data-binding problem. It shipped with BEA WebLogic Workshop 8.1 and was later admitted as an incubated project in the Apache community. After that, a version 1 release has been produced and a second tree has been added to the project for the next version. Currently version 1 tree continues to be very stable -- only bug fixes get checked in -- while heavy work is done in the v2 tree, trying to keep compatibility with v1 public interfaces as much as possible. In this article the examples are based on v1 for the XMLBeans and schema type system, while the streaming part is based on the v2 new code.The Problem
Let’s start with a simple XML purchase order, like the next one, and because we talk about strongly typed XML we need to define the constraints of these kinds of XML documents. These constraints are described in the following schema:Instance document:
<purchase-order xmlns="http://openuri.org/purchase-order">
<date>2003-12-19 12:21</date>
<item description="recliner">
<quantity>5</quantity>
<price>759</price>
</item>
<item description="pen">
<quantity>100</quantity>
<price>3.99</price>
</item>
<!-- These are gifts -->
<item description="iPod">
<quantity>3</quantity>
<price>399</price>
</item>
</purchase-order>
|
Schema:
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:po="http://openuri.org/purchase-order"
targetNamespace="http://openuri.org/purchase-order"
elementFormDefault="qualified">
<xs:element name="purchase-order">
<xs:complexType>
<xs:sequence>
<xs:element name="date" type="xs:dateTime"/>
<xs:element name="item" type="po:item"minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="item">
<xs:sequence>
<xs:element name="price" type="xs:float"/>
<xs:element name="quantity" type="xs:int"/>
</xs:sequence>
<xs:attribute name="description" type="xs:string"/>
</xs:complexType>
</xs:schema>
|
In order to find out the total price of this purchase order, we have to do several things:
- Parse the XML
- Take the contents of the price and quantity elements for each item
- Transform the contents into values of the right types (i.e., int, float)
- Multiply price by quantity and add it to the total
Usually the content of the quantity element arrives in Java programs typed as a java.lang.String from DOM, SAX or StAX. We know that it should be in int, so it needs to be parsed into a Java int. Even if the semantic space of xsd:int and Java int is equal, the lexical space is a little bit different. Example +5 is a valid xsd int but the Integer.parseInt(“+5”) will throw a NumberFormatException.
We have to make sure that the XML values get translated into the right Java values. Java int isn’t the only type that has this problem; others do also, such as short, byte, float, double, and Integer. For date/time types it’s even worse, as the semantic space is different.
XMLBeans can help users with this problem.
The Beans in XMLBeans
In order to get the price and quantity content, just compile the schema with the XMLBeans compiler:scomp schemafile.xsd [options.xsdconfig]The compiler will take the schema files and will construct a schema type system that will contain the resolved information about all element ant types and for each type in the schema type system. Then, a Java bean interface will be generated.

Note that a PurchaseOrderDocument has been generated to represent the type of the document that contains a purchase-order element. Inside it has a PurchaseOrder interface which represents the type of purchase-order element. These two types are not explicitly named in XML schema, but we have no choice in Java world. Java types and the schema types line up, making everything symmetric, including the types and the instances. In addition, one should consider the schema files with the same rank as Java sources.
Accessing the Document Data
After compiling the schema files we can write this code:
PurchaseOrderDocument pod =
PurchaseOrderDocument.Factory.parse(new File(“po.xml”));
PurchaseOrderDocument.PurchaseOrder po = pod.getPurchaseOrder();
float total = 0;
Item[] items = po.getItemArray();
for (int i=0; i<items.length; i++)
{
float price = items[i].getPrice();
int quantity = items[i].getQuantity();
total += price * quantity;
}
System.out.println("Total price: " + total);
|
You can see that everything in this document is strongly typed, having all the advantages of Java language applied to your XML schema-type system. A whole class of errors will be caught at compile-time by the Java compiler.
As you see in the message, there is a piece of information that we don’t see in our Java program: the comment. Most of the users are interested only in elements, attributes, and their content, but there are a few that are very interested in the entire XML info set. If they are interested in comments, processing instructions, white space, or the exact order of the siblings, they have to use API that is one level lower, the XMLCursor API. Using the XMLCursor API will lose the advantages of the strongly typed system, but you not only have access to everything in the XML document, but you can always find out what is the type of the XML you are on and switch back to the strongly typed world.
Creating New Documents
With the generated beans, brand new documents can be created from scratch. If a program would produce the previous instance, it would have to look like this:
PurchaseOrderDocument pod = PurchaseOrderDocument.Factory.newInstance();
PurchaseOrderDocument.PurchaseOrder po = pod.addNewPurchaseOrder();
po.setDate(new GregorianCalendar());
Item item1 = po.addNewItem();
item1.setDescription("recliner");
item1.setPrice(759);
item1.setQuantity(2);
Item item2 = po.addNewItem();
item2.setDescription("pen");
item2.setQuantity(100);
item2.setPrice(3.99f);
Item item3 = po.addNewItem();
item3.setDescription("iPod");
item3.setQuantity(3);
item3.setPrice(399);
pod.save(System.out, new XmlOptions().setSavePrettyPrint());
|
Schema Type System
If schema type information is needed, remember that XMLBeans supports 100% of XML schema, and one should use SchemaTypeSystem API. This API is the equivalent of the Java reflection API for XML schema types. SchemaTypeSystem is a finite set of component definitions.There are two ways of getting a schema type system:
- From a compiled bean:
SchemaTypeSystem typeSystem = PurchaseOrderDocument.type.getTypeSystem();
- From an array of XmlObject that represent schema documents:
SchemaTypeSystem typeSystem = XmlBeans.compileXsd(new XmlObject[]
{ XmlObject.Factory.parse(new File("po.xsd")) },
XmlBeans.getBuiltinTypeSystem(),
options);
Once we have a schema type system, all the information about the schema is available:
- Global type definitions
- Global element definitions
- Global attribute definitions
- Named model group definitions
- Attribute group definitions
List allSeenTypes = new ArrayList();
allSeenTypes.addAll(Arrays.asList(typeSystem.documentTypes()));
allSeenTypes.addAll(Arrays.asList(typeSystem.attributeTypes()));
allSeenTypes.addAll(Arrays.asList(typeSystem.globalTypes()));
for (int i = 0; i < allSeenTypes.size(); i++)
{
SchemaType sType = (SchemaType)allSeenTypes.get(i);
System.out.prinlnt("Visiting " + sType.toString());
allSeenTypes.addAll(Arrays.asList(sType.getAnonymousTypes()));
}
|
Streaming
XMLStreamReader is the interface for reading XML documents from JSR 173. It’s equivalent to the SAX interfaces and the main difference is that the user pulls information out of XMLStreamReader where in SAX the information is pushed to the user program. In XMLBeans v2 the XMLStreamReader interface was extended so that one can get the simple content values of elements and attributes directly typed into Java. XMLStreamReader interface contains methods like int getIntValue() and float getFloatValue(). The contract of this interface is the following:- The stream should to be placed on a startElement, text, space or CData
- Simple content is expected, i.e. only text, CData, space, Entity Ref and comments, if a start element is encountered an exception will be thrown
- If multiple text, CData, space, entity tef are encountered their values will be concatenated and comments will be ignored
- The right white space collapsing style will be applied and parsed into the right value
- It should not do any validation or matching with the schema, the simple value schema type will be implied only from the called method, and sometimes like in the date types from the lexical value
Int getAttributeIntValue(int attributeIndex)
Int getAttributeIntValue(String uri, String local)
|
For getting the date values there are three Java types to represent them: Gdate, XMLCalendar and date. For string values there are two kinds of methods one that returns the value as it was in the XML document and another one that returns the value after the white space collapsing style was applied.
Conclusion
XMLBeans should be used in 90% of cases when XML is accessed/created/manipulated in Java because of its:- Full fidelity representation and access to full XML info set
- 100% XML schema support
- Full-fledged XML schema type system API



