Focusing on open APIs for enterprise applications

Open Web Magazine

Subscribe to Open Web Magazine: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Open Web Magazine: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Open Web Authors: Jnan Dash, Jayaram Krishnaswamy, Bob Gourley, Kevin Benedict, Pat Romanski

Related Topics: XML Magazine, Java Developer Magazine, Open Web Magazine, PHP Developer's Journal

XML: Article

Open Source Database Special Feature: An Introduction to Berkeley DB XML

Basic concepts, the shell commands, and beyond

Path Types
There are two available options for this attribute: Node and Edge. Edge is the path type index preferred by the query processor; therefore, it should be your first choice. It indexes the location between itself and its parent. For example, if you are indexing the hw element in the 10MB XML sample (in the References section), the following sub path will be indexed for hw entry:

hwg/hw

It doesn't make much sense to use a Node type path index because there are many hw elements present in the data. If there had been only one hw element, then it would have made sense to create a Node-type index instead of an Edge index.

Node Types
BDB XML supports three node types for indexing: element, attribute, and metadata.

Key Types
There are three different key types in BDB XML: equality, substring, and presence. You should use the equality key type (on hw) for queries such as:

/dictionary/e[hwg/hw="the"]

Use the substring key type (on e and all its descendants) for queries such as:

/dictionary/e[contains(., "hockey")]/hwg/hw

You should use the presence key type (on qd) for queries such as:

//q[qd]

Syntax Types
The most commonly used types are: string, decimal, float, date, and dateTime. For a complete list of syntax types, check the BDB XML documentation. The example in Listing 10 demonstrates the importance of appropriate indices.

Adding only one index only didn't help much. Having an "edge-element-presence-none" index on qd took almost as long as having no indices (~44 seconds). Verbose mode tells us that there are no indexes used in answering this query. We indexed only part of what we needed;, there is an index on qd only, but the query processor needs more than that. There need to be an appropriate index on q to respond to this query efficiently. Let's create an index on q too. We should have better query performance this time (see Listing 11).

Existing indexes are used efficiently for answering this query by the BDB XML query processor. Note the hexadecimal numbers. Our query time improved from 47 seconds to 1.7 seconds - this is remarkable.

Acknowledgements
Thanks to Glenn Hoffman and Saaid Baraty for providing useful comments.

References

More Stories By Selim Mimaroglu

Selim Mimaroglu is a PhD candidate in computer science at the University of Massachusetts in Boston. He holds an MS in computer science from that school and has a BS in electrical engineering.

Comments (5)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.