Focusing on open APIs for enterprise applications

Open Web Magazine

Subscribe to Open Web Magazine: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Open Web Magazine: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Open Web Authors: Jnan Dash, Jayaram Krishnaswamy, Bob Gourley, Kevin Benedict, Pat Romanski

Related Topics: XML Magazine, Java Developer Magazine, Open Web Magazine, PHP Developer's Journal

XML: Article

Open Source Database Special Feature: An Introduction to Berkeley DB XML

Basic concepts, the shell commands, and beyond

Eager vs Lazy Evaluation
You may have noticed that after each query, BDB XML prints out the number of entries and the query evaluation method:

436 objects returned for eager expression

Eager is the default query evaluation method. Evaluating a query eagerly means that the BDB will store a result as soon as it finds any. In other words, eager evaluation grabs all of the results and stores them in a data structure, and they are available immediately after query execution. However, this is not the case when queries are evaluated lazily. In lazy evaluation, the database will not keep the results in a data structure. It will know how to get them (using pointers), but it will not do anything until the results are retrieved. Results are stored in sets. To get all of the results we have to iterate through the result set, using the next operator. This is what happens internally when we use the "print" command in the dbxml shell. It iterates through the entire set and gets every element of the result set. Thus, when queries are evaluated eagerly, the result set will be filled immediately after executing the query, as opposed to when the queries are evaluated lazily, and the result set either is empty or it has some of the results but definitely not all of them.

It may sound as though lazy query evaluation is never useful, but this is not the case. If you do not need all of the objects returned by the query, using lazy evaluation makes more sense. You can see this with the following query:

dbxml> setLazy on
Lazy evaluation on

dbxml> query 'collection("xbench.dbxml")/dictionary/e
[contains(. , "the hockey")]/hwg/hw'

Query - Starting query execution
Lazy expression 'collection("xbench.dbxml")/dictionary/e
[contains(. , "the hockey")]/hwg/h
w' completed

Note that execution time for this query is ignorable (there is no execution time info printed out by the database). That's because the actual results aren't retrieved yet. BDB XML will retrieve the results only after "print" command. We know that there are 436 objects returned by this query. Instead of getting all of the results, let's get only top eight of them. We can do this by using "print n 8" command.

XML Schema Validation
One of the new and cool features of the BDB XML is its ability to validate XML. First we need to create a container with XML Schema validation enabled. Listing 8 shows the XML sample (10MB XML sample with XML Schema, see the first entry in the References section) that I am going to put into this container.

This document is assigned an XML Schema. The part that shows this assignment is:

<dictionary xmlns:xsi=

This schema is located at

XML Schemas and XML documents can be located on the same machine or on different machines. In this example, XML Schema and XML data are located on two different machines.

dbxml> createContainer validate_xbench.dbxml d validate
Creating document storage container, with validation

dbxml> openContainer validate_xbench.dbxml

dbxml> putDocument dict_10_valid C:\dictionary10_schema.xml f
Document added, name = dict_10_valid

A natural question is whether it's possible to add an XML document into this container without validating. Validation in Berkeley DB XML is very fast, which is a big time-saver. I have found that validating a document in BDB XML takes much less time than some commercial XML editors. However, it may be costly to validate each document when documents are huge. Besides, sometimes XML documents are not assigned to any schema. Listing 9 shows the XML sample (10MB XML sample in the References section) that I am going to put into this container.

Within this container we have two documents named dict_10_valid, and dict_10; the first document is validated, but the second is not. In some cases it's desirable to restrict queries to a specific document in the collection. We can achieve this by using the "doc" function.

dbxml> query 'doc("validate_xbench.dbxml/dict_10")//hwg'
733 objects returned for eager expression

By saying doc("validate_xbench.dbxml/dict_10"), the queries are restricted to run on only the dict_10 document.

Indexing XML documents is very important for good query performance. In fact, indexing XML data is literally the most important task for the user. There are limited automatic XML indexing features in BDB XML, but indexing is best done manually by the programmer. In this section I will introduce you to the basics of XML indexing in BDB XML. Here is the format of an index:

[unique]-{path type}-{node type}-{key type}-{syntax type}

An index in BDB XML is composed of four parts:

  • Path Types
  • Node Types
  • Key Types
  • Syntax Types
Uniqueness indicates that the value being indexed is unique in the XML document. For example, in an employees data set, employee number will be unique, along with the social security number.

More Stories By Selim Mimaroglu

Selim Mimaroglu is a PhD candidate in computer science at the University of Massachusetts in Boston. He holds an MS in computer science from that school and has a BS in electrical engineering.

Comments (5)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.