Focusing on open APIs for enterprise applications

Open Web Magazine

Subscribe to Open Web Magazine: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Open Web Magazine: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Open Web Authors: Jnan Dash, Jayaram Krishnaswamy, Bob Gourley, Kevin Benedict, Pat Romanski

Related Topics: Apache Web Server Journal, XML Magazine, Java Developer Magazine, Open Web Magazine

Apache Web Server: Article

Enterprise Application Integration with a Native XML Database, Java, and Cocoon - Powerful flexibility with a simple API

Enterprise Application Integration with a Native XML Database, Java, and Cocoon - Powerful flexibility with a simple API

A client recently asked EDS to design and support an EAI implementation based on XML messaging. The implementation of this solution created a need for an internal application that would allow multiple developers and analysts to create and manage a variety of XML documents. The solution needed to be inexpensive, flexible, extensible, and yet easy to manage.

Since the client's environment and business needs are constantly changing, the company didn't want to spend excessive time on support and maintenance of internal applications. The solution also needed to leverage the effort EDS put into creating and maintaining XML documentation around these data services.

To create the solution, EDS used two open-source technologies - Sleepycat's Berkeley DB XML (bdbxml) ( and Cocoon ( The Apache Cocoon product provides the XML-based application framework, while the bdbxml product is an application-specific data manager providing XML persistence and retrieval. The combined technology creates a flexible, powerful framework for quickly building and deploying customized XML-based applications.

The Software
Apache Cocoon, an XML publishing framework, employs a dynamic pipeline-processing concept that generates, manipulates, and ultimately renders the XML content into the presentation format of the user's choosing. Generic templates exist for rendering XML to standard formats such as HTML and PDF. The framework is easily configurable and extensible for custom XML-based applications. A significant portion of this configuration and customization is accomplished through XML documents that detail the pipeline steps necessary for a custom application.

The Cocoon pipeline consists of three types of components - generators, transformers, and serializers. A generator, the starting point for any given pipeline, produces the XML that is processed by the pipeline. Transformers can manipulate the XML as needed or perform operations based upon the XML content. Multiple transformers can be employed as needed. Finally, a serializer renders the final XML content (the result of the last transform) into a presentation format. The XML is passed through the pipeline as SAX events.

From a content application perspective, EDS used Cocoon to deliver dynamic content to analysts who were researching or testing our XML services. Conceptually, this process is common, and several products would have sufficed.

The Cocoon project provided the additional benefits of being:

  • Flexible and extensible
  • Open source
  • Free
With an XML-driven application server like Cocoon, we also found we needed an XML storage medium. Apache has built-in support for the Xindice XML product. However, Xindice is a pure-Java XML database with a Java API. We found we wanted support for non-Java APIs. Additionally, we desired an application-specific data manager, which allows flexible integration at various architectural levels.

This criterion led us to Sleepycat's bdbxml.

Introducing Berkeley DB XML
Sleepycat's Berkeley DB XML product (bdbxml) is built on the company's Berkeley DB product. It is designed to store and retrieve XML documents and provides XPath querying capabilities against the set of stored documents. The multiplatform product provides APIs in a variety of languages. It also can manage both Berkeley DB and bdbxml data under the same "container." For the EDS solution, our team focused on the bdxml Java API and XML capabilities. The product leverages the open source Pathan binaries for XPath functionality and the Xerces-C++ binaries for XML parsing capabilities. These libraries provide native-platform speed and are integrated well into the product.

The bdbxml product is open source as well, under a dual-style license. Customers can use it at no charge if the embedding application is open source or used only at a single physical site. Customers that redistribute bdbxml within a non-open source product must purchase a commercial software license. The API is simple to use, consisting of seven base classes, and a great deal can be accomplished without understanding much more than this simple XML interface. If more robust techniques are necessary then the full API provides access to objects that manage the complexity in indexing, transactionality, concurrency, and distributed transactions as necessary.

For EDS, the product met the necessary requirements by providing:

  • Flexibility by storing the native XML documents and providing XPath query capabilities against them
  • Round-tripping capability to a simple repository
  • Simple document-oriented metadata that can be stored without affecting the integrity of the XML document
  • Multiplatform and multilanguage APIs
  • A lightweight implementation that is highly efficient for XML document storage and retrieval
  • An application-specific nature providing flexibility in future architectures and application
The Solution's Data Model
The solution uses bdbxml as the XML source for the application framework. The EDS custom-built interface to bdbxml is through an XML document that provides instructions to the component, which encapsulates the interaction with the database. This architecture is extremely flexible within the Cocoon framework and allows for substantial reuse of this component in various pipelines. The downside to this decision is that there is overhead in interacting with the component via an XML document; however, bdbxml's superior performance more than makes up for this shortcoming.

The application's traffic will not be high volume, and the solution that it employs provides a separate "database" for each user. While this would not be considered with a traditional database, the application-specific nature of bdbxml makes this feasible. This architecture simplifies the solution enormously. By creating one database for each user, indexing and performance concerns are eliminated, because the container will never grow that large for a single user (based upon the requirements).

Transactionality and ACID concerns are removed because users will be limited to updates against their own database. A Cocoon application can store and retrieve these documents without having to worry about the complexities around synchronizing access to a common resource.

Getting Started
To create a similar solution, you should have JDK 1.4 to work with as well as Cocoon (I used Java.1.4.0_02).

The first steps are downloading and installing Cocoon and bdbxml. You may want Ant installed ( if you want to use the Ant scripts provided.

To download and install Cocoon, visit and follow the instructions. Choose version 2.1 or later. The download and installation process is straightforward. You may select either the binaries or the source version of Cocoon. If you choose the source version, you will need Ant installed to build the binaries.

To download and install bdbxml, visit and select the Berkeley DB XML product. The installation of bdbxml is simple. You will need to compile the source. If you run into problems, go to

It would also be helpful to obtain the Getting Started Guide available at Once the products are installed and compiled, you're ready to begin.

A Brief Look at the Java API
Before working with the custom Transformer code, take a quick walk through the bdbxml API. Javadoc is included with the installation, and it may be helpful to browse this too.

The XML Java API resides in the com.sleepycat.dbxml package. The XmlContainer object encapsulates the concept of a database in bdbxml. It provides methods for opening, closing, loading, and saving (the dump method) the database, as well as creating indexes. It also provides methods for extracting documents via XPath (queryWithXPath) and the document ID (getDocument). Documents are added to the container via the putDocument method and removed using the deleteDocument method. No update document method is provided. Updates are accomplished by first deleting and then re-adding the document. The XmlDocument object encapsulates an XML Document and related metadata.

This class provides methods for retrieving and storing the content of the document as a string and as a byte array. It also provides methods for getting and setting metadata describing the document. This nifty feature allows users to store metadata about the document without affecting the integrity of the XML document.

The XPathExpression class is used to provide an object representing a parsed XPath expression. These objects can be obtained from the XmlContainer.parseXPathExpression. They are passed to the XmlContainer.queryWithXPath method to query the database for documents matching the XPath expression.

The XmlResults and XmlValue objects provide a simple interface for examining results of the XPath query against the database. The XmlResults object is returned from the XmlContainer.queryWithXPath. It is similar in fashion to an iterator with a size method to obtain the number of results and a next method to retrieve the "next" result. A call to this next method returns an XmlValue object.

The XmlValue object encapsulates a node in an XML document. XML document nodes can have one of three value types - boolean, string, or number. XmlQueryContext is a configuration class used to control the behavior of the XmlContainer when executing XPath queries. The query mechanism supports both eager and lazy querying, as well the capability to return either full documents or document fragments.

The XmlIndexDeclaration class is used to define an index declaration. Indexes can be created by methods on the XmlContainer object. The type of index to create is situation specific.

XML Interface to XML Database
The XML interface to bdbxml is a component that will execute XML instructions against the database and then produce an XML result.

First, review the schema that will govern the interaction with bdbxml.

The schema consists of several elements. The operation element is the root node, containing the following attributes:

  • Database: The full path to the database.
  • Type: The operation type. Valid values are create, delete, update, inquire.
  • Docid: The document ID for delete, update, or inquire operations.
  • xpath: The XPath expression for inquire operations.
  • Wrapped: "True" if you want the result wrapped in the Sleepycat Cocoon schema; "false" if you want just the XML document returned (the document that results from the operation - valid for inquire only).
The wrapped-false setting can be used to perform XSLT translations against the resulting document (versus the Sleepycat schema elements). The operation element can contain metadata elements and potentially one result element. The metadata elements tell the transformer which metadata items to extract from bdbxml for each document extracted.

The result element is a child of the operation node. It contains a result attribute that will contain "success" or "failure" depending on the result of the operation. It can also contain one or more document elements. The document element contains multiple occurrences of the metadata element and potentially text, as well.

The text would be the XML document retrieved from the database. The metadata element has a name and value attribute. One metadata element will be returned for each metadata element requested via the operation element.

Listing 1 is an example request document (Listings 1-4 and full source code are available at

This document instructs the transformer to inquire against the database and retrieve the document with the ID of 1.

The request document also asks that the following metadata items be retrieved: Name, Description, and Last Modified.

<bdbxml:operation XMLns:sleepycat=
type="inquire" docid="1">
<bdbxml:metadata name="name"/>
<bdbxml:metadata name="
<bdbxml:metadata name=

The SleepycatDBXMLTransformer
Take a look at the transformer that will process these instructions (see Listing 2).

The transformer parses the XML instructions document using the SAX methods - startElement and endElement. Once it has obtained the instructions, it executes the request. The results are then encoded into an XML response document (again, via SAX events), which is then passed back into the Cocoon pipeline for further processing.

The setup method obtains a reference to the SleepycatDBXMLManager stored in the session. The startElement and endElement methods employ the SAX interface to capture the XML instructions on the Sleepycat operation to execute (essentially the XML document discussed in the previous paragraph).

When the startElement method detects the "bdbxml:operation" element, it examines its attributes to ensure correctness and then calls the startTextRecording method. This convenience method (implemented in AbstractSAXTransformer) will record the text of an element. The captured text is then obtained by calling the endTextRecording method, which is in the endElement method.

After the endElement method detects the end of the "bdbxml:operation" element, it calls the endTextRecording method to obtain the captured text. In this case, the captured text is the XML document that should be inserted into bdbxml (if performing an insert or update operation).

The performDbOperation method is then called by the endElement method. This method is responsible for executing the appropriate action based upon the XML instructions.

Lines 173-191 implement the delete functionality.

Using the XmlContainer and the ID passed in the XML, call the deleteById method on the container. This method deletes the document referenced by the ID from the container. The delete API call takes three parameters. The first is a DbTxn object if your work involves transactions. It didn't in this case, so we passed null. The second parameter is the XmlDocument, which was just created, and the third is an integer flag. The flag can be used to control fine-grained behavior, but we didn't need it so we passed a zero.

Lines 192-216 implement the create functionality. The create functionality employs a few simple bdbxml API calls.

First, a new XmlDocument instance is created, and the content object is set via the setContent method, passing in the XMLDocument text. The document is stored in the container via a call to putDocument, which takes three parameters (similar to the delete method previously described). The first parameter is a DbTxn object. Since you are not doing the work within a transaction, you default it to null.

The second parameter is the ID of the document to be deleted, and the third parameter is an integer flag. From a bdbxml perspective, those three API calls are all that is necessary to insert a document into the database (for persisting the XmlContainer, see the discussion that follows).

Lines 262-291 implement the update functionality. The XmlDocument object to update is retrieved using its document ID from the database. The content is updated via the setContent method, and the updateDocument method is called with the XmlDocument object.

Lines 475-625 are responsible for creating the SAX events necessary to communicate the results of the operation.

Several lines of code are needed to generate the SAX events, but the code is simple and straightforward.

It's important to note that all events are passed through in the startElement and endElement methods by calling the super.startElement and super.endElement methods. Therefore, the resulting document from this transformer contains all of the input with the addition of the "Sleepycatresult" element. This design provides flexibility within the Cocoon environment because the full XML operation is available to manipulate in the stylesheets as necessary. Some additional overhead is involved, but for this situation, the flexibility was more advantageous.

In addition, the class isn't thread-safe. Implementing the Recyclable interface tells the Apache Avalon framework (which Cocoon leverages) that the component can be pooled. The Avalon framework creates an instance pool, permitting reuse of the component. It also protects against multiple pipeline threads entering the component by allowing access of only a single thread at any given time.

When it comes to persisting the database, this functionality is accomplished through the helper class SleepycatDBXMLManager. A reference to the SleepycatDBXMLManager was obtained in the setup method, or if the reference could not be obtained, then a new manager instance was built. The manager object, then, is stored by the session and exists for the life of the session. Look more closely at the manager object (see Listing 3).

The SleepycatDBXMLManager object is basically a Hashtable object that implements the HttpSessionBindingListener interface. (This changed to HttpSessionListener interface in the servlet 2.3 implementation. This interface is used because Cocoon is currently running under the servlet 2.2 implementation.) This interface allows the object to be notified anytime the session in which it is stored is destroyed. The session may be destroyed through either invalidation or timeout. In either case, the SleepycatDBXMLManager object receives notification of this event via the unboundValue method. This method retrieves the XmlContainer stored in the manager object and persists each container via a call to the dump method. The dump method is similar to the other API calls we have seen. The first parameter is the database name, which doubles as the filename.

The second parameter is an integer flag, which you default to zero, as it is not needed.

In this situation, load and dump are not necessary. You could allow bdbxml to handle the persistence through the normal open and close method calls. Using load and dump provides a simple way to persist the database into a single file, which makes the backup process easy. That's it from a transformer perspective. With just few lines of code, you have implemented a simple, reusable XML interface to bdbxml.

The new transformer can now be put to use.

Note: If you have problems with Cocoon finding the bdbxml libraries, make sure that the bdbxml bin and lib directories are in your java.library.path.

Using the New Transformer Within Cocoon
Look at the Cocoon sitemap configuration to build the necessary pipeline (see Listing 4).

Five steps exist in the pipeline. First, the HTTP POST from the TestTransformer.html is encoded into a SAX document using the Cocoon-provided RequestGenerator.

Second, these XML events are then converted into the bdbxml:operation XML document using the default XSLT Transformer and the convert.xsl stylesheet. Refer to the sample code to see this simple stylesheet.

Third, the SleepycatDBXMLTransformer receives the operation instructions, executes them, and produces the XML reply, which is then serialized using the Cocoon-provided XML serializer. In a real application, this XML reply would be transformed into an HTML page (or the like) that the user could use to view or edit the results.

You can compile the transformer by issuing the necessary Javac commands. Alternatively, a simple Ant script is provided for this purpose. A number of JARs from the Cocoon libraries are necessary for the compile to succeed, and the script properties must be modified to reflect the installation of Cocoon and bdbxml.

ant compile
ant deploy

The first Ant target compiles the source and jars it. The second Ant target moves the JARs, HTML, and XSL files into the WEB-INF directory (Cocoon Web-application root).

After successfully compiling and deploying the JAR file as well as the site map and the TestTransformer.html files, start the Cocoon server with:

<C servlet

Point the browser at http://localhost:8888/cocoon/TestTransformer. The TestTransformer pipeline is a simple interface to test the DBXMLTransformer.

Using this component within the Cocoon pipeline provides tremendous flexibility in putting together quick applications that need to be backed by a robust XML database.

The XML interface fits smoothly within the XML processing pipeline concept and allows extensive leverage of Cocoon's transformational capabilities. bdbxml is a key ingredient in the solution because it allows for simple integration into the application framework and provides superior performance. This application-specific nature coupled with the multiple-language APIs positions you well for future architectural directions.

Businesses typically persist business data to an RDBMS for a variety of reasons. The RDBMS toolset is commonly available, an RDBMS environment is readily available in most enterprises, and the reporting/data extraction mechanisms are typically readily obtainable.

For these reasons and others, XML databases have made more headway in other areas, such as caching, document-centric management (where RDBMS capabilities are not a good match), and the embedded toolset area (providing localized XML storage and retrieval support). Sleepycat's Berkeley DB XML addresses these areas well, providing an application-specific data storage solution with a simple API built on a robust and proven architecture. It is well suited to functioning as an XML storage and querying mechanism and provides powerful flexibility with a simple API.

We hope your awareness of application-specific XML databases has been raised and that this exposure will lead you to consider an XML database as a potential solution to your next XML storage problem.

An embedded XML database can immediately increase the options available to architects and developers. It also increases the flexibility and extensibility of a solution, and that leads to advantages in bringing new functionality to market sooner.

Why Not a Full Content Management Solution?
We didn't want the overhead of a full content management solution. We wanted a lot of flexibility for developers without the "document management" overhead. We didn't want extensive versioning procedures. The users will manage the documents themselves. Nor did we want process or metadata overhead. No need existed to create extensive document-describing repositories for searching or to provide a consistent micromanaged process for document editing. Additionally, many content management solutions employ proprietary technologies that involve learning custom workflow languages. Cocoon and DB XML provide a simple approach that fits our needs. Cocoon provides great flexibility in the mechanisms employed to manipulate XML. A variety of technologies and tools can be brought to bear in the pipeline process, which makes XML content manipulation both powerful and flexible. DB XML provides flexibility and extensibility by allowing metadata to be attached to documents and by providing an XML/XPath solution. Additionally, we can use XML documents to describe XML documents if we want to. The API is simple and provides round-tripping on the XML documents (which means they are guaranteed to look just as they did when they were put into the database).

Last (and perhaps most important), our solution had to be free. Since this tool is for internal use, we didn't want to spend much money on building, supporting, or maintaining the tool. If the tool is down for a period of time, we would have alternatives and workarounds available. The open source community provides one the best mechanisms for support of this nature.

What about the XML:DB API and the Cocoon Pseudo-Protocol?
The XML:DB API is a community effort to develop a standard programatic interface to XML datastores. Cocoon supports the XML:DB API ( through a pseudo-protocol; however, it currently supports only inquiry functionality (including XPath queries).

Our needs include an update interface. While an XMLDBTransformer is provided by Cocoon, it requires XUpdate capabilities and DB XML does not currently support XUpdate. In our case, XUpdate is largely overkill since we're concerned with documents, not document fragments.

Additionally, the small, consistent XML interface provided the simplicity we desired, allowing us to reuse our sitemap components and pipelines. Implementing separate inquiry and update interfaces seemed unnecessary, and our interface allows us to be more dynamic and flexible in how we create and execute instructions against the XML repository.

Finally, the XML:DB API provides no support for metadata. This lack of support may be intentional, as many suggest that you should use XML documents to describe XML documents. While this argument has some merit, this lack of support is inconvenient at best. DB XML provides metadata support, and we had every intention of using it.

The benefits of the XML interface and the flexibility of DB XML outweighed the detriment of being tied to one vendor's implementation (besides, we spent more time discussing the solution than actually implementing our simple transformer).

The XML:DB protocol is a solid protocol that will continue to see development and support within the XML community. If Sleepycat provided an XML:DB protocol or XUpdate support, then we would definitely revisit this decision since Cocoon and other products are embracing and extending their support of the XML:DB protocol.

More Stories By Dan Hatfield

Dan Hatfield is a senior consultant within EDS’ EAI practice. He uses XML and Java technologies extensively in this role. He often leverages open source solutions to improve project infrastructure and team productivity.

Comments (2)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.