Focusing on open APIs for enterprise applications

Open Web Magazine

Subscribe to Open Web Magazine: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Open Web Magazine: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Open Web Authors: Bob Gourley, Kevin Benedict, Pat Romanski, PR.com Newswire, RealWire News Distribution

Related Topics: Apache Web Server Journal, Open Web Magazine

Apache Web Server: Article

The pros and cons of business-app implemention via open-source software (Part 1)

Is open-source or Microsoft-licensed software the right choice for better, faster, cheaper and safer business-application implem

(LinuxWorld) — This is the first installment of a series comparing the implementation results for real business applications. We'll examine business-application implementation using Unix tools and ideas and how this plan of attack compares to what happens when the same apps are implemented using Microsoft-licensed software.

Each application will be the subject of two articles. The first one will present the theoretical — or "book-learning" — view of the issue and invite readers with real-world experience in using the technologies to contact the author in confidence to correct my errors, give your estimate of the time needed for the work and discuss what goes wrong when you try to go from theory to practice.

The follow-up article will then try to summarize community experience with the technologies in order to draw out conclusions everyone can use and to answer the basic question for this series: is open-source software better than Microsoft software?

The focus here is on the technology, but readers should be aware that the most-important factors in real architecture decisions usually have little or nothing to do with the technology or cost. The goal in making tech decisions is to get a product that works for its intended users, but getting the best product at the lowest cost is only part of this.

As discussed at length in my Unix Guide to Defenestration, any Unix technology, no matter how insanely great or cheap, can be made to fail if the managers who get control of it after implementation either wanted something else and consciously or unconsciously set out to prove they were right. If a manager only understands how to manage a proprietary system and insists on applying those ideas to Unix, it can also doom the project to failure.

Nichievo Inc.

All three of the examples planned have a context set by the same project in the same imaginary company: Nichievo Inc. This setting, which is purely imaginary, is designed to illustrate a political opportunity for Unix and open source in an otherwise closed shop. Be aware, however, of the risks involved: if the people who take over from you on system delivery don't want to make the system work, it won't.

The overall job involves setting up a secure digital exchange for what Nichievo calls an acceptance order. Nichievo insures receivables; an acceptance order is the company's commitment to pay an insured receivable in case of default by the debtor. In its unsigned form, such an order is a quote. Signed, it is a contractual commitment. Our job is to collect quotes as they are issued, make them available for review/signature by senior managers and make the signed orders available for customer download.

Background on Nichievo and the overall systems project under discussion can be found in this extended sidebar.

As envisaged, our solution will require XML-publishing capabilities, so this first article will look at the Windows versus Linux option in terms of the core hardware and licensed software needed for XML publishing. Specifically, we'll look at the choice between Apache/Cocoon and Microsoft's proprietary tools.

The third article in the series will look at the development issue. If we picked Cocoon and open source in round one, we'd already be largely committed to using Java Beans and Java for forms management and validation, but there are quite a few other things we need to do. For those, should we use Perl, PHP or try to extend our use of Java and the Cocoon framework?

On the other hand, if we decided to do this job with Microsoft's tools, we could use a third party IDE for some tasks but will be using BASIC or some variant of it for others.

The fifth article will look at the database layer (Ed. Note: The even-numbered installments of this series will be reserved for answering and showcasing reader feedback to prior installments). If we made the Microsoft choice in round one, SQL-Server is a given here. The open-source choice, in contrast, has options: should we use leading products like PostgreSQL and mySQL, explore interesting new ideas like eXist or choose a commercial product like Sybase?

That's the plan, but this series need not be limited to these toolkits or the Nichievo case. If you have additional or alternative suggestions for toolkits, please let me know.

The alternatives

The job seems to call for a cross between a centralized XML-document publishing solution and a customer portal. Either way, we need provision for strong authentication, lots of logging and applications code to handle the online addition of digital signatures, as well as some back-end database functions to simplify acceptance processing for standing orders.

Under this view of the process:

  1. Everyone involved — customers, partners and authorized professional staff — would connect to the server using the desktop-browser tools they already have. By default, they would first connect to Apache on port 80 using the normal hypertext transfer protocol (HTTP) and then switch to the Tomcat implementation on port 8080 and HTTPS after initial authentication.
  2. All costs are contained within Nichievo. SSL works with almost every browser — including Konqueror, Opera, Netscape, Mozilla and IE4 or later — and should therefore impose no new systems requirements on clients.
  3. Internal staff would use the server to receive and review customer change requests and to upload customer documents such as unsigned acceptance orders.
  4. Senior partners or their delegates would review the unsigned orders online, and the system would then affix the firm's digital signature to approved orders.
  5. Customers, using the same SSL (secure socket layer) tools, would post change requests or pick up digitally signed orders from the Web server.

Reality check
It's important to think about, particularly when putting an application design into a service proposal, who it is that's paying your bills. In this case, the client, Nichievo, has a dismal technology record and a CIO who is not on side with the managers bringing us in.

In this situation, your clients have to let the CIO review the proposal, and it's not that unusual for him to respond by having some of his people cook up a few screens that prove beyond any doubt that he can do a better job of implementing your ideas than you can. If senior management buys into that, it leaves your sponsors looking like idiots and you without an invoicable project — or friends you can go back to in that company.

Having been burnt a few times, I now brief the sponsors whenever that kind of thing looks likely. I put something significant in the proposal that hits a few hot buttons among client executives but is as hard as possible for the CIO to promise or do.

In this case, we don't need XML to do this project; we need it to win this project.

Technically, ordinary HTML with either PERL or PHP would work just fine. However, management would really like to automate much of the customer interaction and, of course, the international outsourcing services firm hired 18 months ago has failed to deliver on its promises to do this.

The CIO isn't the only threat to worry about. That giant international outsourcing services firm doesn't want its billings to fall and could respond by going to the firm's managing director with a story that blames the CIO for their failures. If only he had let them use Domino, everything would long since have been working beautifully.

Technically, Domino would work for the basic job but be very hard to push to full customer-portal operation. Putting together a working Domino demo wouldn't be that hard; in fact, it would be much easier than doing it with Cocoon. The scenario reverses, however, as you add functional complexity. By the time you've developed a full messaging portal, my guess is that Domino would demand many times the programming effort than Cocoon needs to get to that same level.

So how serious a threat is this? Well, if you pack enough serious suits with pretty business cards into a room, it's often rather easy to convince senior management to feel heroic and dedicated about backstabbing their friend the CIO — a tough decision, of course, but taken in the interests of the company, you understand.

To head this kind of thing off, I've been praising the ebXML standardization effort spearheaded by OASIS (the Organization for the Advancement of Structured Information Standards) as a downstream means of standardizing the business-messaging they need to enable the customer message-exchange they want.

I think acting on this direction would be premature, but I picked XML for this application as a building block toward eventually doing that — knowing that it would be very difficult to do with Lotus while blocking the CIO's credibility if he attempts a putsch.

It would be possible to do this using servers already in place in Nichievo's 34 operating offices but the centralized approach is preferable because:

  1. User activity — including authenticated connections, failures to authenticate, document changes, rollbacks or transactions — can be unambiguously logged and time stamped in one place.
  2. We can establish a single point of responsibility for successful operations.
  3. Many management tasks, such as ensuring that transmitted documents do not contain viruses or other negative materials, are simply easier to organize and control if done in one place than if done in 34 places.
  4. Centralization greatly simplifies (not to say uniquely enables) firm-wide control of acceptance sign-offs by essentially eliminating the PKI-key-management problem.
  5. Centralization reduces redundancy, but failure-recovery capabilities such as rollback and serialization are relatively easy to implement and use well-known, well-understood mechanisms.

We could implement the centralized alternative in one of two ways:

  1. By building what we need around the XML-publishing tools provided by the Apache Cocoon project with a database like PostgreSQL.

    The steps required using the open source toolset (Apache and Tomcat with mod_perl and Cocoon on Linux, BSD, or Solaris) are extensively discussed on the Apache/Cocoon site. Here's what the main page says about it:

    Apache Cocoon is an XML publishing framework that raises the usage of XML and XSLT technologies for server applications to a new level. Designed for performance and scalability around pipelined SAX processing, Cocoon offers a flexible environment based on a separation of concerns between content, logic, and style. To top this all off, Cocoon's centralized configuration system and sophisticated caching help you to create, deploy, and maintain rock-solid XML server applications.

    Cocoon interacts with most data sources, including filesystems, RDBMS, LDAP, native XML databases, and network-based data sources. It adapts content delivery to the capabilities of different devices like HTML, WML, PDF, SVG, and RTF, to name just a few. You can run Cocoon as a Servlet as well as through a powerful, command line interface. The deliberate design of its abstract environment gives you the freedom to extend its functionality to meet your special needs in a modular fashion.

  2. By using a combination of Microsoft tools including BizTalk, SQL-Server, SAX2 and BASIC in a Microsoft dot.net framework, with or without an integrated XML development environment like Altova's XML Spy IDE.

    Although Microsoft's site doesn't seem to offer a clear statement of direction on this kind of work, they do provide a long and apparently detailed discussion showing how various products can be integrated to achieve something vaguely similar to Cocoon.

Note Microsoft issued a press release on October 8, 2002, announcing:

"The 'Jupiter' Vision Aims to Unify and Extend Current E-Business Server Technologies And Include Standardized Business Process Management Capabilities, Deeper Support For XML Web Services, and Richer Developer and Information Worker Experiences."

This consolidation is to be achieved over the next 18 months and may, or may not, ultimately provide a cocoon-like wrapping for Microsoft's XML publishing environment.

The applications

From a design perspective, we see the central system as a Web-based "order switch" that collects requests from customers, recommends orders from juniors, orders approvals from senior partners and then passes the approved orders back to customers.

Diagram 1 below shows typical high-level use cases for this.

To make this work we need:

  1. A user interface or client.
  2. Processing applications.
  3. Document storage, retrieval, backup, and recovery.
  4. Logging tools to keep track of who does what and when.

Since we have no control over the client device, reliance on a Web browser as the user interface is a given. This decision essentially determines that we'll use a Web server as our means of communicating with the user client and logging accesses.

Document volume is quite low: we expect a maximum of only about 120,000 acceptance orders per month. On the other hand, we have to keep every document ever filed on the server online in order to support the firm's customer relationship management effort and to provide data for part of its risk-assessment methodology. With this in mind, the numbers build relatively quickly. In three years, we can expect to need online access to something like 3.2 million approved and 400,000 unapproved orders.

It would be possible to store and index these as signed and unsigned documents, but it would be better to store the data for them in simple tables and have the application construct the documents on request. That step complicates processing but enormously facilitates activity logging, reporting, backup, system recovery and statistical uses of the data.

In this case, use of a database would also reduce disk space requirements considerably. A typical order document stored as a Microsoft Word 10 binary takes 19,658 bytes exclusive of the standard contract terms referenced in it, but storing that information in an SQL table takes only about 320 bytes for the addressee (which is normally stored only once per customer) and 190 bytes per covered receivable for about a 95 percent overall disk space saving (after indexing and overhead). It is not the dollars that are important here; disk is cheap. What's important is the reduction in backup and recovery time. Recovering 60GB from tape takes hours; recovering 60MB takes minutes.

Using a database to construct documents on the fly eliminates concern over varying input file formats, as data-entry can be handled via a browser form. On the other hand, it creates two additional problems:

  1. The need to produce documents in usable form at the output end.

    This is addressed through use of XML. This will enable us to produce almost anything the customer requires, from PDFs to flat files suitable for use with spreadsheet tools to formats we currently don't know about.

  2. The need to maintain a legal record of the signed order as transmitted to the customer.

    We cannot rely on our ability to reproduce documents sent to customers on an as-needed basis because it would be possible to argue in court that our system could have changed in the interim. This, in turn, could introduce doubt about the authenticity of the copy we generate.

    To deal with that, we need to store the actual document sent the customer together with authentication and delivery information. This does not, however, destroy the usefulness of the database approach. The best solution is probably to do both: use the database and store the final documents as sent. That's because, in almost all cases, the need to take time to recover the document files will not impede resumption of production operations and so does not significantly affect recovery time.

Therefore, as shown below, the design will be based on using a database to store the information going into each document, using an "XML-enabled" application layer to construct documents as needed, and using a Web server as an interface to the user's browser.

The processing applications needed can be thought of as modules within an overall framework. Diagram 3, below, shows a typical screen flow for one such module.

Actual definition of these screens is best done using an active prototyping approach in which you start with your best, and usually rather naive, idea of how it should work and then do two things in parallel:

  1. Get users to work with it, implement suggested changes that seem to drive toward a working system and go back to get more input. Keep stepping through that — and expect to throw away at least one set of screen layouts as a false start — until they stop suggesting significant improvements.
  2. Write, with lots of user input, the user manual. When users sign off on it, treat it as the formal requirements definition.

Once your prototype achieves stability, you can implement formal testing and review by users not previously associated with the project and use their comments to refine the thing to the point that they think your prototype "works."

Once the system works, phase two will deal with deployment issues including:

  1. Installation and recovery testing along with any code cleanup and retesting needed.
  2. Making decisions about system duplication and data or process mirroring.
  3. Formalization of change, backup, recovery and operational procedures.
  4. Debugging and interoperability testing to ensure successful operation with the technologies already in place in the firm.
  5. Hand-over planning and operational training for affected staff.

The costs

From a capital cost perspective, the new system is to fit into the existing network and support framework. Consequently, initial infrastructure costs are limited to the server and any licensed software needed.

Server sizing is something of a non-issue. We know that the database will be quite small, probably still under 20GB three years from now, and we know that typical usage volume will also be quite low because, on a typical day, the company insures about 7,500 customers of whom around 900 will record some change — usually a receipt or a new receivable on a rolling account.

The weakest link
In this situation, verify that the network can deliver. You may have a 10MBS connection with low utilization but that doesn't mean you can add a substantial new load. Particularly on PC-type networks, all kinds of things — firewalls, poorly configured or underpowered routers, "invisible" SMB network use — can foul things up.

If the network is slow, your users won't care about your excuses or your demonstrations of how fast the server is. They'll see poor response and turn off. Be sure to test your connection, repeatedly and at different times of day, before agreeing to its adequacy.

If their in-house network won't support your access needs and the local network guru doesn't take action, try to take your test system somewhere else... and make the network effect obvious when the guru's boss has to migrate the box in-house.

It is likely that most users will initially see use of this server as an additional burden imposed by management and respond by fitting their interactions with it into already busy schedules. In practice, that means we can predict usage surges just before or after lunch and just around go-home times in each time zone. Unfortunately, West coasters tend to leave impositions until after lunch while East coasters do them just before going home, leaving us facing the likelihood that the two biggest surges, those from the Pacific and Eastern time zones, will overlap.

Once people see value in this service, usage will balance out. The quickest way to destroy any chance of that happening is to underconfigure the hardware at the beginning. Users who have to wait for your server the first time they connect to it will have their resentment of the new imposition reinforced, and you'll never recover their trust.

On the other hand, the cost difference between "about right" and "grossly overpowered" is only a few dollars in this context, so I've intentionally specified an insanely overpowered machine below: a dual-processor Dell Xeon running at 2.4-GHz.

Server Capital Cost
  Microsoft Open Source
machine type
(Data from Dell.com, Sept24/2002)
Dell 4600
includes Triplite 3000VA UPS, CD, Floppy, and network cards, no services, monitor, mouse, or keyboard
Dell 4600
includes Triplite 3000VA UPS, CD, Floppy, and network cards, no services, has 16" monitor, mouse, and keyboard
CPU Type 2 x 2.4GHZ P4/Xeon 2 x 2.4 GHz P4/Xeon
RAM 4 x 1GB DDR/SDRAM 4 x 1GB DDR/SDRAM
Internal Disk 4 x36GB internal 4 x36GB internal
Document Storage Dell 220 Powervault; 6 x 73GB, US160 Dell 220 Powervault; 6 x 73GB, US160
SDLT Tape External 220GB with controller and CA arcserv license External 220-GB with controller
Operating System Windows 2000 Data Server with 25 client licenses Caldera Open Linux
Total Cost $31,087 $26,047

I also considered a Sun 480 for this role, as the cost wouldn't be much different despite its fiber channel disk, the higher reliability of Solaris on SPARC and its upgradeability to four CPUs giving it some advantages.

For this article, I want to compare Linux to Microsoft solutions on the same hardware, but a real-world decision would be more influenced by the workload. The twin Xeons are faster than the two UltraSparcs, but the Sun machine offers a hardware cryptographic accelerator for $2,700 that is capable of doing around 4,300 SSL "hand-shakes" per second. If system usage were going to be high relative to the hardware, that accelerator would make a big difference. But that isn't true here, and the Xeon's shorter completion times on single tasks makes them the better choice as far as I am concerned.

On the software side, I've not worked with the Microsoft stuff and am not all that sure what we need or don't need. The list here is deduced from the how-to article on the Microsoft Web site referenced earlier.

Software Licensing Cost
  Microsoft Open Source
Database Layer (licensed per processor) Microsoft SQL Server 2000, includes:
  • SQLXML 3.0 and SQL Server 2000
  • XML View Mapper 1.0.
  • ADO and adPersist XML
4,999 x 2= $9,998
may need enterprise editions ($19,999 per CPU)
PostGreSQL or mySQL 0.0
Database integration (Licensed per processor) Bizztalk Server 2002, standard edition Includes Simple API for XML (SAX2) 6,999 x 2 = $13,998 Cocoon 0.0
Programming Language XMLSpy IDE (assumes VB license?) $999 mod_perl or mod_php 0.0
Proxy/Cache web server Internet security and acceleration server (ISA) 1,499 x 2= $2,998   0.0
Other required licenses Unknown ? none 0.0
Total List   $25,739   0.0

As an operational matter, the importance of this data to the company means that I'd recommend redundancy — setting up two servers, in different cities, with different administration, and different Internet backbone connectivity — at somewhat more than twice the cost. As shown below, you could do that with the Linux solution for about the cost of one Windows 2000 system.

Total Capital Cost
  Microsoft Open Source Percentage
Savings with
Open Source
A note on purchase timing
Server Hardware $31,087 $26,056 16% For in-house development on Windows 2000 you would probably load the licenses once, on the production machine. That means you'd buy everything before writing a line of code.

In the Linux world there aren't any license portability issues so you'd develop and test on an existing machine, postponing capital expenses until you had a working system and thus reducing overall project risks.

Software Tools $25,739 0 100%
Total $56,826 $26,046 54%

Get your feedback in!
This should be an area of intense comment from people who have actually used this stuff. Remember, article two needs your experience and opinions. If you have used Cocoon or the BizTalk/SQL-Server combo in a real application, please contact me.

We do not yet have man-power estimates for either the development or the operational phases of this work. On the development side, the requirements are currently only loosely understood while operational issues have yet to be discussed at any length.

Nevertheless, experience tells us that the first prototype can be developed under Cocoon in about a week and that the process is likely to go through from three to five iterations before a full-user manual (which is the requirements specification) can be written for user signoff.

Key issues

Clearly, infrastructure costs for the open-source solution are less than half of those for the proprietary solution. In itself, that fact doesn't make the Cocoon solution better. The cost difference — perhaps $100,000 for a two-way redundant system — looks like a lot of money at the personal level but barely registers on Nichievo's bottom line.

Failure would hurt both us and our sponsors, but it won't break the firm. Success, on the other hand, affects the balance of management power in the firm and could lead to radical change starting with the cancellation of the current development contract and the ousting of the CIO. That, in turn, would create opportunities for us in particular and the open-source movement in general; replace 1,500 or so Windows desktops with Unix smart-displays, and we'll have a massive positive impact on the firm's bottom line.

The potential rewards of change are therefore clear, and no one's under the illusion that we're here to sell Microsoft products. But we still need to ask the question as fairly and "straight-up" as we can:

What are the relative risks associated with each decision? Use Cocoon, or use Microsoft's tools?

Implementation risk

If you choose the open-source route for this, there's no doubt it will be going into a hostile environment... but the Windows decision isn't all that great either. Yes, it makes you compliant with the CIO's preferred direction, but it still leaves you in a conflict with the international outsourcing and consulting firm that's been beavering away there for the last eighteen or so months.

Different agendas, different methods
Having the client own and manage the development environment is great if your primary interest is selling time. After all, waiting for the other guy to act (or just for a PC to grind something out) is far more profitable than working because it reduces your average selling cost.

One client I know fell for this twice, not only demanding control of the development servers but once buying the development house's used 486s and once getting another consultant's retired P2s as Oracle development workstations. These worked as Oracle seats, but Windows NT and Oracle on P2 gear made for lots of long — and fully billable - waiting, while mutual finger-pointing and related delays added more billable days to the project's overall duration.

Either way, you'll have people working against you — fewer and more muted with Windows than with Linux, but no picnic either way.

This is, of course, the biggest risk there is. But if you've made your clients aware of the danger and they're willing to take the risk, then its your job to minimize it without undermining their judgment by agonizing over it.

Resource control is the most effective risk-reduction strategy possible here. If you want to succeed, own the hardware and control network access to it, even if that means putting a bunch of their PCs in a room with the server and a small hub. Later, put two phases into your deployment plan:

  1. one in which training and familiarization is done using your server (and possibly your network).
  2. a production roll-out for which the existing staff are responsible.

To facilitate this, I often include an offer in the proposal to develop on production scale hardware that we own until hand-over. At that time, the client can decide to buy it at the pre-agreed price or replace it with hardware of his own. In most cases, this looks like a great risk-reduction strategy to the client... and it is, because they don't spend a hardware nickel until the software works, and they don't face a systems transition either. However, its real purpose is to trap the opposition between rocks and hard places:

  1. If they insist on their server brand, they're responsible for the transition, but expectations are already set by the performance of the development machine.
  2. If they take over your machine and administer it to death, there's little room for finger-pointing when you come back and set it right again.

Notice, however, that this strategy requires you to buy the machine and any needed licenses up-front. This is a powerful argument for Linux because:

  1. You can do initial work on almost anything, transitioning to production-scale gear just before volume-user testing.
  2. The 54-percent cost difference may not mean much to Nichievo, but $40K not spent on licenses is real money to you.

Stability

Both sets of tools are under development with both subject to change. As a rule, however, Microsoft's changes affect everything from the operating system (which may require new hardware to run) to the client interface layer. Apache's changes tend to be independent of the operating system.

There are operating-system patches to consider in both cases, but Linux patches don't generally require application reconfiguration or testing. Windows service packs, in contrast, often change everything from licensing terms to API internals.

From a stability perspective, therefore, both choices mean that we will be adapting to technical change as it occurs, but the Cocoon option limits that to the application and is therefore strongly preferable.

Recoverability

The absence of licensing issues, together with the separation of application, database, server and OS on Linux, mean that we could recover the application to any Linux machine capable of handling the load. That isn't true on Windows 2000 server; a failure pretty much has to be recovered on the machine that failed. Otherwise, we're really looking at a new install — something that's usually much harder and more time-consuming to do.

Given how critical this application is, recoverability is a killer issue and a strong vote for Linux with Cocoon.

Security

Security is the other killer issue. There have been security issues with Apache, Tomcat, PostgreSQL and Linux, but not many. Those that appeared were quickly remedied. The Microsoft toolset, on the other hand, has dozens of outstanding security issues, including XML-based attacks on SQL-Server and Windows 2000 Server. Remediation is usually slow in coming.

This, to me, is a decisive issue: Linux and Cocoon it is.

More Stories By Paul Murphy

Paul Murphy wrote and published 'The Unix Guide to Defenestration'. Murphy is a 20-year veteran of the IT consulting industry.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.