This is the
Database Developer column from the June, 1998 issue of Web Techniques.
© Web
Techniques, reprinted by permission.
There are times when we are able to ignore a headline-grabbing phenomenon. But for me, February and March 1998 was a period when El Niño and the "Year 2000 problem" were impossible to ignore. There were also three acronyms that kept popping up in conversations about database server technology; OLTP, OLCP, and OLAP. These acronyms will be the "phenomenon" of my column this month. (See table 1 for a guide to acronymns discussed in this article.)
My most recent assault by El Niño occurred on the night before my flight to Los Angeles to attend a conference on Microsoft SQL Server 7.0. A rain squall dropped so much water so quickly that a creek near my house rose about four feet in 15 minutes. The problem was that the creek is three feet deep. When the flood subsided, we had a "block party" at midnight to clear the street of debris, mud, and rocks. To put a different spin on a famous Robert Frost line, I came away with the impression that "good shovels make good neighbors." The conference schedule was intense, with daily events scheduled from 7 a.m. to 10 p.m. Nonetheless, it was a welcome change of scenery and less stressful than watching the creek spill into the street.
In a manner similar to El Niño storms, some aspect of the year 2000 (Y2K) problem seems to appear every few days. In recent weeks, I've heard several examples that demonstrate that some of our largest corporations face major challenges in fixing their software to pass Year 2K-compliance tests. The typical description of the problem in the media or trade press refers to legacy mainframe applications that use a 2-digit year, such as dates in the MM/DD/YY format. Y2K is really a class of problems. It includes PC BIOS problems, and programming languages and DBMS products that lack a DATE type. If you haven't started testing for Y2K-compliance, you should begin immediately. Y2K is not simply a legacy-system issue. New systems and new applications are not immune. I'll definitely revisit the Y2K issue in future columns (note: See the October and November 1998 issues of Web Techniques.)
Many Web database developers today are working in the onlien transaction processing (OLTP) arena, doing such things as order processing, banking, stocks, and e-commerce applications. The Web is the most recent, but not the only venue, for transactional systems. Book an airline ticket, withdraw money from an ATM, or charge dinner on a credit card and you are generating online transactions that update a database.
OLTP has been a major application of database technology for several decades. OLTP was actually a precursor to DBMS technology. One of the earliest examples of a transaction processing application is American Airlines' SABRE system. The chronology of the SABRE system development shows a marked contrast with today's rapid development of Web sites. Preliminary studies for the SABRE system required four years, starting in 1954. Tests for a single city began in 1962, and all American cities were added to the system by 1964. The ten-year development cycle is an indicator of the quality of today's software development tools, with which we build complex applications in a fraction of that time. Before we become too smug, however, we should consider that the first SABRE system processed 2,100 inputs per minute and 40,000 passenger names a day using two computers that had 64K of memory. That's 64 kilobytes, not 64 megabytes! The disks that stored the passenger data had an average access time of 115 milliseconds, instead of today's norm of 10 milliseconds.
Reduced cost, combined with hardware performance improvements, are major reasons for SQL databases being well-suited for building today's OLTP applications. Another reason is that SQL does well with the numeric and character data used by OLTP applications. Competition has also been a major factor. The SQL DBMS marketplace is a buyer's market because of intense competition that dates back to the 1980s. Over 40 vendors actively participate in the Transaction Processing Council (TPC), which creates the specifications for and publicly reports the results of several standard benchmark tests.
TPC's current SQL transaction processing benchmark is TPC-C, which uses nine tables. It models an order-entry application, with transactions for new orders, payment, delivery, order status, and stock level tables. TPC-C generates metrics for the new order transaction rate per minute and price/performance. The Transaction Processing Council reports TPC-C results for a variety of hardware and DBMS combinations.
The transaction processing throughput of SQL systems has grown dramatically in recent years. As of March 1998, the best performance on TPC-C was 57,073 transactions per minute. In 1997, Microsoft demonstrated scalable server technology by doing a 24-hour transaction processing demonstration. Since Jim Gray joined the company, Microsoft has pursued a transaction processing strategy based on clusters of commodity-priced servers. Microsoft built a 45-node cluster that used more than 900 disks. In a 24-hour period, it processed over a billion transactions. That's more than today's largest operational OLTP systems, which currently process about 250 million transactions per day.
The Transaction Processing Council is working on a new Web database benchmark known as TPC-W. The TPC-W benchmark is designed to model a business that sells over the Internet, or any network that uses Web-based transactions. The benchmark will take into account issues involving the database, HTTP, Web server performance, and users who are browsing and ordering.
Although decision support systems (DSS), executive information systems (EIS), and business intelligence systems have been around for a long time, it was the browser that convinced me we'd see an explosion in multimedia databases, onlien analytical processing (OLAP), and data warehousing systems. The browser is an "everyman" user interface, so simple that it doesn't require a highly technical user to operate OLAP applications. SQL vendors apparently reached a similar conclusion. They don't forecast much growth in the OLTP market, but they do expect significant growth in the number of OLAP and onlien complex processing (OLCP) users.
E.F. Codd, the originator of the relational data model, wrote a white paper in 1994 that introduced the term "online analytical processing" into the lexicon of database users. OLAP is the current term for systems that were formerly called decision support systems (DSS) and executive information systems (EIS). Database managers are not a "one-size-fits-all" proposition. The reason for using different nomenclature is to distinguish between classes of database processing, where OLAP systems have characteristics that are different from operational systems, such as OLTP.
One reason to differentiate between OLTP, and systems that support data warehousing and OLAP, is to assist in the proper selection of hardware and software. Operational systems and transactional systems are markedly different from data warehouses and OLAP. Some of the differences include schema design, workloads, database sizes, and query characteristics. A system optimized for OLTP will support a high volume of users running short-lived queries. A system optimized for OLAP, or decision support, will typically have few users with long-running queries. OLAP involves aggregation and multidimensional data, which can result in very large databases.
OLAP products fall into different categories. Those that store and retrieve data in relational databases are called ROLAP products. Those that persist data using a multidimensional data store are known as MOLAP products. Hybrid servers, known as HOLAP servers, can access data stored in relational or multidimensional data formats.
Why is OLAP the focus of so much attention? Consider the Web and e-commerce. IDC estimates that by 2002 Web commerce will grow to $327 billion a year. If e-commerce becomes that big, there will probably be a lot of organizations that want to know more about their customers, buying patterns, product performance, and so on. Of course, analyzing how your Web commerce systems are doing is only a small piece of the OLAP picture. Independent of any Web commerce considerations, many organizations have been moving to data marts and data warehouses to store historical data used for decision support queries.
The typical scenario (figure 1) for a data warehouse is that an organization will maintain separate servers for operational databases and decision support databases. Transactions go against the operational store, and data transformation software extracts data from the operational store to feed a data mart or data warehouse. The data transformation process includes scrubbing the data to remove anomalies and pre-calculating aggregations to reduce query times. An OLAP server accesses the data mart or data warehouse to build multidimensional data sets known as cubes. OLAP applications execute queries against the cubes to answer questions such as:
- What is our most profitable product line in each sales region?
- For the past 24 months, what percentage of orders for each product came from our Web site?
- What is the distribution of patient falls by region, hospital, and nursing unit for the past 12 months?
- What dinner menu items generate the most profit?
- For the past three years, which of our products have the highest rate of return due to manufacturing defects?
When SQL vendors recognized the demand for data warehouses and OLAP, they made acquisitions or strategic alliances to integrate OLAP engines with their SQL servers. IBM uses Arbor Essbase with it DB2-based OLAP server. Informix acquired MetaCube and Sybase acquired ExpressWay. Oracle acquired IRI's Express product and Microsoft acquired OLAP technology from Panorama Software Systems to integrate into SQL Server 7.0.
Benchmarks
Because OLTP and decision support are fundamentally different missions, the Transaction Processing Council developed a separate decision support benchmark. TPC-D, like TPC-C. reports performance for a combination of hardware and DBMS. TPC-D uses 17 SQL-92, decision support queries that go against 6 tables (customer, part, supplier, line item, order, and a part supplier bridge table). The queries must be used without modification. When running TPC-D, the vendors must mimic a decision support environment with ad hoc queries. They must use dynamic SQL, with no host variables or static SQL. TPC-D reports a throughput metric, price/performance, and a power metric based on performing a single query at a time. Another consortium of vendors, the OLAP Council, also developed a benchmark for OLAP systems.
Programming
The OLAP Council specified an application programming interface known as MD-API, but that API has not received the same level of OLAP vendor support as OLE DB for OLAP. Microsoft developed OLE DB for OLAP by adding components and interfaces to OLE DB. In its version 1.x releases, OLE DB worked with data providers that delivered data in a tabular format. OLE DB for OLAP works with multidimensional data providers, and supports multidimensional data sets that result from cube queries. It adds schema objects that include cube, dimension, hierarchy, properties, level, and member component objects. Support for OLE DB is strong in the OLAP community and it is likely to become the de facto standard for OLAP client programming.
The term OLTP is classic computerspeak, and the OLAP acronym owes it origin to a research report, but I'm not sure who coined the term online complex processing (OLCP). The typical use of OLCP refers to database processing involving dynamic arrays, large objects, multimedia, or other complex data types. OLCP is the type of application for which an object-relational DBMS is a good solution. To better understand OLCP and the problems that an object-relational DBMS solves, it is often easier to consider an example query. Michael Stonebreaker, Chief Technology Officer of Informix, spoke recently at Database and Client-Server World and used a good example to illustrate Informix's ORDBMS technology. The State of California Department of Water Resources (DWR) has a huge database of 35-mm slides depicting waterways, irrigation canals, and aqueducts. The slides table has an identifier, date, caption, and image data in Kodak Photo-CD format. DWR wanted to automatically classify slides by content and execute queries such as "Find a sunset picture taken within 20 miles of Sacramento." To query slides by content, developers created user-defined functions such as the contains and sunset functions. The contains function accepts a document and keyword as arguments, and checks whether the keyword appears in the document. The sunset function examines the image for orange at the top. Installing those functions in the database permitted DWR to run this query:
select id from C.picture, Places A, Places B where sunset(image) and contains(A.name, tag) and A.location && B.location and B.name = 'Sacramento';
Informix offers sophisticated object-relational technology that permits you to write custom components, known as DataBlades, to install new types and behavior (methods). IBM DB2 Universal Database, Oracle 8, Sybase Adaptive Server, and UniSQL are also major players in the field of OLCP.
Final Notes
Web developers doing database work are increasingly finding a requirement to develop for new systems that use complex data types or answer questions not answered by traditional SQL databases. Even as traditional transaction processing is setting new records for performance, Web developers are being asked to build new systems that support OLCP and OLAP. Luckily, database vendors are developing the tools, APIs, and server technologies to meet those needs.
| Acronym | Meaning |
|---|---|
| DSS | Decision Support System |
| EIS | Executive Information System |
| HOLAP | Hybrid OLAP |
| MOLAP | Multidimensional OLAP |
| OLAP | Online Analytical Processing |
| OLCP | Online Complex Processing |
| OLTP | Online Transaction Processing |
| ORDBMS | Object-Relational DBMS |
| TPC | Transaction Processing Council |
| Y2K | Year 2000 |
© Web Techniques, reprinted by permission.
Ken North develops software, consults, and teach seminars. He wrote Windows Multi-DBMS Programming (John Wiley and Sons) and Database Magic with Ken North (Prentice Hall PTR). For more info, browse to http://www.knorth.com.
Informix
http://www.informix.com
Microsoft
http://www.microsoft.com/sql
Oracle
http://www.oracle.com
Sybase
http://www.sybase.com
Transaction Processing Council
http://www.tpc.org
Revised: October 2, 1998