Nicholas L. Carroll [HREF1], Web Engineering Group [HREF3], School of Electrical and Information Engineering [HREF4], The University of Sydney [HREF5]. Email: ncarroll@ee.usyd.edu.au
Rafael A. Calvo [HREF2], Web Engineering Group [HREF3], School of Electrical and Information Engineering [HREF4], The University of Sydney [HREF5]. Email: rafa@ee.usyd.edu.au
In this paper we study the integration of Web services in an object-relational database system. This allows a database to import data from live data feeds or remote heterogeneous database systems on demand. We first describe how we have added a SOAP interface to the PostgreSQL database system, implemented as a user-defined function that allows developers to make service calls from within the database. Secondly, we show that SQL can also be used to query data provided by Web services, which introduces a novel way of accessing and using Web services in a database-driven web application. Finally, we compare the performance of two architectures for consuming Web services with a benchmarking procedure using the Google Web service API.
Web services have become an important research topic in recent years, mostly because it promises interoperability between loosely-coupled applications across distributed heterogeneous computing environments. The provision of data and application logic can be exposed by a system through Web services. Some of the most common Web services found on the Internet simply provide data such as XMethod's stock quote service, Google and Amazon's Web service APIs, and eBay's price watcher. These websites and many other web applications like them are driven by database systems, and the Web services that they provide can be viewed as an interface to transparently access these database systems.
Web services can be distinguished as either static or dynamic Web services. The Web services that adopt the data-provisioning scenario such as Google's and Amazon's can be classified as static Web services. We focus on static Web services in our integration of a SOAP engine within a database system through user-defined functions. User-defined functions were used in [1] to access heterogeneous data sources through OLE DB. In this paper we bind user-defined functions to a SOAP engine to enable the database system to not only access heterogeneous data sources, but to also access data sources from distributed systems through Web services. We take this perspective of using Web services within a database system one step further by exploring the use of SQL as a viable language that can satisfy the need for a Web services query language.
The simple concept of using SQL to query Web services -particularly static ones- can be used to improve the architectural design of database-driven applications, by providing new ways of querying data in distributed heterogeneous database systems. The improved usability of using SQL is realised when a Web service such as Google's search service returns two-dimensional sets of data which can be translated to tabulated data of rows and columns, a structure synonymous to database tables or views.
The paper is subdivided into five parts: section 2 provides an overview of the Web services framework, discussing in more detail the difference between static and dynamic web services, and the two main architectural components: the requestor and the provider. We use this background to depict a working environment for using Web services in a database system. A refinement of this environment is presented in part 3 as we scope the use-case scenarios and characterisations for requesting Web services from within a database system. In part 4 we describe our implementation using PostgreSQL and gSOAP and evaluate the integration of these open-source technologies. Section 5 concludes.
Web services can be viewed as units of application logic providing data and services to other applications. They can be invoked through a well defined functional interface described in a machine-processable format specified by the Web Service Definition Language (WSDL) [2], and can be easily discovered and accessed with the Universal Description, Discovery and Integration standard (UDDI) [3]. Applications interact with Web services through SOAP messages, which are typically conveyed using ubiquitous web protocols and data formats such as HTTP and XML. The widespread adoption of these standards enables interoperability between loosely-coupled applications across distributed heterogeneous computing environments. This has allowed Web services to be pervasive in nature, and have found their way into web applications and middle-tier technologies.
The primary issues that Web services tackle are the issues of data and application integration, and that of transforming technical functions into business-oriented computing tasks. These two aspects of Web services allow businesses to communicate on a process or application level with their collaborators, while providing the freedom to adapt to new situations or work with different information providers on demand.
Web services configured towards data and application integration can be classified as static Web services, and those dealing with the composition of static Web services to construct workflows for business processes are considered to be dynamic Web services. Dynamic Web services rely on a Web services composition language such as the Business Process Execution Language for Web Services (BPEL4WS) [4] for utilising existing Web services to be composed into a new Web service using well-defined process modelling constructs [5, 6]. Architecturally, dynamic Web services are often positioned at a higher level (closer to the user entity) than static Web services to allow for the flexibility of switching between Web services for business processes that may change. Static Web services are positioned at a lower level as they are generally situated closer to the data-source or application logic that they provide an interface to. It is this class of Web services that we are interested in for the purposes of using SQL as a query language for Web services.
The two main roles in Web services are service providers and service requestors. The service provider is the platform that hosts access to the service, and can be viewed technically as the server entity. The service requestor is the application that invokes or initiates an interaction with a service. The interactions between the service requestor and the service provider are encapsulated in Simple Object Access Protocol (SOAP) [7, 8] request/response messages. The Web service requestor role can be interpreted as either a browser driven by a person, or a program without a user interface such as another Web service. The provider and requestor roles are explored further in sections 2.1 and 2.2 respectively.
Databases are increasingly complex systems and their functionalities can be accessed by other systems using Web services. The most common practice is to enable them as a Web service provider. In this configuration, the Web service layer describes functions that a client can call. These functions that have been described in the WSDL file typically bind to an underlying program or middleware application that handles the incoming request, executes a database query, and returns the output of the query [9]. It is then up to the Web service layer to wrap the output of the application in a SOAP message, and return the message back to the requesting client.
Figure 1. Architecture for a Database Web Service Provider
A typical Java implementation of a Database Web Service Provider is portrayed in Figure 1. At the forefront of the service architecture is the SOAP engine for handling incoming SOAP request messages. Apache Axis can be used to demarshall SOAP messages, and execute corresponding methods in Java classes in the middleware layer that access a database system using a JDBC driver. After executing an SQL statement through JDBC, the data returned in a JDBC ResultSet object can be passed back to AXIS from the middleware application, to where the data is marshalled into a SOAP response message and returned back to the Web service requestor.
Database systems can also be enabled to request Web services directly, which eliminates the need for an external layer to provide this functionality. A database system can be extended with user-defined functions as shown in Figure 2 that implement the capability to call a Web service, and make the requested information available within the database. This means information retrieved from Web services can be coupled with information from other Web services, along with information stored locally in the database using SQL.
The ability to request Web services from within a database allows the database to access dynamic data from heterogeneous and distributed sources. Data produced on demand, such as financial information, weather forecasts, and scientific data can be pulled down, processed and stored within the database. The way Web services can be integrated into the database is transparent to the application developer. From the developer's perspective, the information provided by a Web service function is analogous to accessing data stored locally in a database table. Therefore information provided by a Web service can be manipulated or joined with local tables using standard SQL commands.
Figure 2. Architecture for a Database Web Service Requestor
Conversely, the notion of pushing data to a Web service for processing is also applicable under the database Web service requestor model. By decentralising the procedural logic, intensive computations can take place on a dedicated server provided by a Web service, therefore distributing the computational load amongst many computers. The resulting computations can then be retrieved and inserted into the calling database.
The conventional approach of requesting Web services is from within the application middle-tier. The business and the presentation logic are two separate parts of what is often called the middle tier of a web application. As opposed to the presentation layer, the business logic implements those components that are not visible to the user. This separation facilitates the division of work between programmers and graphic-designers. In most web applications, persistent data used by the business logic is stored in databases. Web services are mostly integrated in this layer, by implementing code that interacts with a Web service provider using SOAP.
The advantage of consuming Web services within the middle-tier is the convenience of accessing Web service APIs provided by the programming language used to implement the business logic (ie. Java, ASP or PHP). The drawback of this architecture is that if the information returned from a Web service call is connected with the data stored in the database in some way, then there will be an inefficiency caused by the overhead of communications between the database and Web service via the middle-tier layer. In part 4, we describe an architecture that does away with this overhead, leaving the database to pull data down from Web services directly.
Our focus for the remainder of this paper is on the architecture of a client application that requests Web services from a service provider. More specifically, we present a new architecture for requesting Web services from within a database system.
Some software architectures are more appropriate for some type of problems than other, so comparisons cannot be abstracted from the circumstances the systems are used in. In order to compare two architectural strategies we first need to discuss the user scenarios and characterizations we considered. The use-case scenarios describe requirements in which a system with these architectures might be used:
Web application frameworks usually adopt a three-tiered architecture that consists of a presentation layer, business-logic layer, and a database layer. Introducing Web services into the database layer would extend this layer to a data-access layer, providing a singular conduit to data from the business-logic layer. Combining external data sources from Web services with the database provides a more usable and maintainable framework for accessing data from the business-logic layer. In the comparison we also use the following characterisations, which are also described in more detail immediately below:
Most database systems provide support for user-defined functions, where developers can add their own implementations to the function catalog of a database system. The advantage of this approach is that the integration of Web services is not dependent on extending SQL to support Web services. Secondly, access to Web services is totally transparent to the application developer, as the invocation of a Web service is perceived to be a call to an SQL compliant database function instead.
By introducing Web services into a database system, the developer is able to use SQL to query Web services. This brings us to the second characterisation where developers can use SQL semantics while still gaining the benefit of an underlying infrastructure that allows the communication between the database and Web services. The ability to use a query language to consume Web services can improve the usability of Web services, especially in the use-case scenarios mentioned above.
The third and fourth characterisations convey the limitations of using SQL as a high-level interface to Web services. We have made the assumption that the Web services that a developer would consume from within the database system are data-provisioning services. This limits the use of SQL to query static Web services that provide access to data through a SOAP interface.
SQL is not an appropriate query language for dynamic Web services, which are complex Web services that deal with the composition of other Web services to formulate workflows for business processes [5]. Dynamic Web services rely on meta-data for their composition with other Web services. A more suitable query language for dynamic Web services is XQuery [10], which is XML based and can process meta-data [11, 12]. The benefits of using XQuery over SQL for querying static Web services are negligible. SQL would be the preferred approach, even as an intermediary query language until XQuery matures, and a skill-set for XQuery becomes as commonplace amongst developers as SQL.
We chose two open source technologies for the implementation and evaluation of the system: gSOAP and PosgreSQL. We integrated gSOAP, a C/C++ SOAP implementation into PostgreSQL. PostgreSQL is an open source Object-Relational Database Management System (ORDBMS) that offers many modern features often found in enterprise database systems. The most useful feature for this investigation into the exploit of Web service capabilities in a database system is the support for extending PostgreSQL through user-defined functions. These are functions that can be written in C or a procedural language to provide additional functionality to the system if required.
To integrate a Web services client into PostgreSQL, a C/C++ SOAP implementation was required. gSOAP is a generator tool for coding SOAP/XML Web services in C and C++. The tool offers an XML to C/C++ language binding, which can be used to convert a WSDL schema for a Web service into C and C++ declarations that are stored in a standard header file. This makes the remote method proxy interface transparent to a user. The "gSOAP compiler executable, soapcpp, processes the C/C++ declarations and generates C and/or C++ source code stubs for integration in a client" [13].
The way PostgreSQL incorporates user-defined functions is through dynamic loading of shared objects. An object code file that implements Web service requestor functionality can be compiled as a dynamic shared object and incorporated into the PostgreSQL server. Functions in the dynamic shared object are mapped to user-defined functions in PostgreSQL. These functions are loaded by the server and executed on demand. Low memory usage is a requirement when integrating user-defined functions into the PostgreSQL server. gSOAP is suited for this purpose, as the executables have a "small memory footprint that is typically less than 150K" [13].
The "Hello World" of Web services is the stock quote Web service that provides a stock price for a given stock code. This is the simplest scenario for providing data on demand within a database system. It is also the quickest test for verifying that Web services can be integrated into the PostgreSQL database. By completing this test we were confident that more complex data can be handled with database Web services, and this is demonstrated in the second experiment.
Implementation: the user-defined function takes a ticker symbol and using the PostgreSQL API, converts the VARCHAR data type to a character array in C, and then passes it on to the Web service proxy interface provided by gSOAP. The proxy interface maps the C character array to a string, and marshals the string value into a SOAP message that is delivered to the stock quote Web service.
The service provider in return processes the request, and marshals the stock quote into a SOAP response that is delivered back to the PostgreSQL database system. The response is demarshalled by the gSOAP proxy interface and the stock quote is mapped to a C character array, which inturn is mapped to a FLOAT value using a macro provided by the PostgreSQL API.
Results: The user-defined function is executed in the same way as any typical function belonging to a database system using the SQL SELECT command, and is as simple as the command shown in Figure 3, where "YHOO" is the ticker symbol for Yahoo Inc.
test=# SELECT get_quote('YHOO') as yahoo;
yahoo
---------
47.52
This experiment was designed to test whether complex SQL queries can be applied to data introduced into the database system using Web services. This test is meant to demonstrate that SQL can be used to query Web services, and that the two interact quite effortlessly. It further suggests that Web services can be integrated into the query processor of a database system. This would provide the possibility of querying any external data source, be it a database or a component of computational logic, and have this information imported into the local database system, where it can be presented in tables in relation to locally stored data.
The main difference between this experiment and experiment one is that here we will be working with more complex data-types which can be presented as multiple columns and rows in a database table. Whereas in experiment one, we were returning the equivalent of one cell of data in a database table, which really didn't provide any support for more complex queries. By returning data from Web services in database tables, we are able to utilise JOIN and sub-query SELECT commands which are normally reserved for more complex queries.
Implementation: A database table is simply a two dimensional matrix, and can be easily represented in another two-dimensional structure such as an array-of-arrays. Array-of-arrays is the best representation of tabulated data in SOAP, and is necessary for marshalling a database table into a SOAP message, and vice versa at the receiving end. The composition of an array-of-arrays structure consists of an outer array that contains inner arrays as its elements. These inner arrays map to rows of a database table, and their elements map to attributes or columns of a database table.
PostgreSQL has a special API for set-returning functions that provides support for returning multiple rows of data from a C-language function. This provided the mechanism for importing data contained in C implemented array-of-arrays structures into the database system as tabulated data. gSOAP handled the mapping of an array-of-arrays SOAP data-type to an equivalent array-of-arrays structure in C, which is essentially a pointer-to-a-pointer. The PostgreSQL API takes the C array-of-arrays structure and converts it to a set of rows. The set of rows generated makes the data compatible to be queried with SQL commands. This works because the data provided by Web services have been introduced into the database system as database tables, and is therefore in a format that can be readily queried using SQL.
We implemented a set-returning function in C that interacted with a Google Web service. The Google Web service provides a function that takes a search query, and returns a set of search results containing meta-data for each website URL that matches the search query. This is the typical functionality that the Google search engine is renowned for, through its web-based user interface. The C function was then compiled with gSOAP to generate the SOAP proxy classes for communications with the Google Web service. The SOAP proxy classes are responsible for the conversion of C data types to XML SOAP data types and vice versa. After compiling the C function as a dynamically loadable shared object, the shared object was registered as a function within the Database function catalog. A user-defined data type also had to be registered with the catalog system, so that the system knows what attributes or columns the user-defined C function is able to return. This is the equivalent of defining a database table that exists in a remote system. So in effect this set-returning function acts like a proxy table to the remote database table.
Results: The set-returning C function can be used as shown in Figure 4.
test=# SELECT url,title FROM google_search('computers') LIMIT 5;
url | title
---------------------------+----------------------------
http://www.apple.com/ | Apple
http://www.dell.com/ | Dell...
http://www.compaq.com/ | Compaq Product Information
http://www.gateway.com/ | Gateway
http://dmoz.org/Computers/ | Open Directory - Computers
(5 rows)
The query in Figure 4 requires results for the search query "computers", and to display a maximum of five results and only the url and title attributes for each row returned. The Web service query is synonymous to querying a local database table. In fact the data provided by the Web service can easily be joined with data from a local database table, or with another third-party Web service provider.
Figure 4 conveys the ease with which to import near real-time data into a database system using Web services. The Round Trip Time (RTT) between sending the SOAP request and receiving the corresponding response is governed by bandwidth, server response time, and the time for the client to initiate the request and to process the response. Therefore it is possible to import near real-time data into a database system using Web services if RTT can be reduced to a negligible time delay. In the following section we test for a performance gain by consuming Web services directly within the database system, as oppose to the more conventional architecture of consuming Web services within the middle-tier layer.
Here we examine the performance of two architectures for consuming Web services within the client application. Firstly, we consider the conventional approach of consuming Web services from within the application middle-tier. We then analyse and compare the performance of the conventional architecture with the new architecture that incorporates database Web services.
The business and the presentation logic are two separate parts of what is often called the middle tier of a web application. As opposed to the presentation layer, the business logic implements those components that are not visible to the user. This separation facilitates the division of work between programmers and graphic designers. In most web applications, persistent data used by the business logic is stored in databases. Web services are mostly "consumed" (integrated) in this layer.
The advantage of consuming Web services within the middle-tier is the convenience of accessing Web service APIs provided by the programming language used to implement the business logic (ie Java, ASP, or PHP). The drawback of this architecture is that if the information returned from a Web service call is connected with the data stored in the database in some way, then there will be an inefficiency caused by the overhead of communications between the database and Web service via the middle-tier layer. It was therefore necessary to determine if there was in fact a performance gain for consuming Web services directly from within the database system.
Implementation: For this performance evaluation we have simplified the situations in which Web services would be used in a real application. We have a set of fixed queries stored in the same architectural element as it would be in each of the above architectures: the database or an external script representing the business logic. All queries are stored in a database table and accessed for the two Web service consumers. The goal of the two systems is to store the top 10 results of the queries into another table.
The middle-tier consumer was written in Perl using the SOAP::Lite Perl module. Calling a Web service using SOAP::Lite is done by creating a SOAP::Lite object, specifying the WSDL file for the Google Web service, and then calling one of the end points or methods that the Web service provides. In this experiment we used the doGoogleSearch method, which takes a search query and returns a set of URLs and website meta-data.
In order to make the benchmark more realistic, the database consumer was called from a Perl script as it would normally be done by a real application. This second architecture hides all the Web service details from the Perl script that only needs to invoke a procedural language function in PL/pgSQL that runs the test case within the database.
Both scripts had to establish a connection to the PostgreSQL database. The first script had to connect to the database, retrieve all the search topics, query the Google search Web service, and then connect to the database once again to insert the results obtained from Google. The second script only had to connect to the database once.
Based on the number of database connections the Perl script had to make to execute an SQL command, it was expected that the test case would perform better as a PL/pgSQL function in the database, than as logic in the middle-tier Perl script. As shown in Figure 5 our results confirm this assumption.
Results: The time take to retrieve the data used in the queries, produce the SQL command, get the results and store them is a process that takes very small amounts of time at an individual basis, but becomes a scalability issue for applications with large amounts of transactions per second. We measured and recorded the duration of this whole process in seconds by producing a set of queries with information stored in the database, and the test was repeated for varying sets of search terms. The results for this experiment are presented in Figure 5.
For the smaller sets of queries that were processed, the database Web service architecture was comparably faster than the middle-tier architecture. It was 0.06 seconds and 11.53 seconds quicker processing one query and twenty queries respectively. The time taken to process the larger sets of queries is where the database Web service architecture outperformed the middle-tier architecture considerably. The task of processing ninety and one hundred queries took 34.86 seconds and 32.28 seconds less when performed within the database.
There were a couple of artefacts in the collection of results. Most notably the database Web service architecture performed slower for thirty and seventy queries by 3.82 seconds and 8.72 seconds respectively. These artefacts could be the result of simultaneous queries sent by other Web service clients to the server providing the Google Web service. Since we do not have any performance measures for the Google Web service provider, we can only speculate that performance degradation at the server end during these test cases could have contributed to these artefacts.
Figure 5. Performance results.
The chart conveys a trend that consuming Web services within the database is more efficient than consuming Web services in the middle-tier, especially when intense database interactions are involved.
We studied a new architecture for integrating distributed heterogenous database systems, where a table or view from one database system can be accessed remotely from another database system through Web services. We described the implementation of user-defined functions in PostreSQL that provides a SOAP interface to Web services. In doing so, we demonstrated that SQL could be used to query Web service data as if the data was stored locally in a database table. This provides a novel, yet powerful way for consuming Web services.
The implementation of the middle-tier architecture, where the Web service calls are implemented at the application layer, were found to be slower than the architecture where the service calls are directly integrated into the database. The improvement is due to a better coupling of the information used for the query and the query itself in the performance test case. The results are very promising and might be used to redefine the optimum architectures for the integration of Web services in web applications.
These evaluations were performed on the same architectural elements that would appear in a complete web application. We are in the process of integrating the database Web service to a web application framework, and study how they can be used to generate an abstraction layer that not only enhances reusability of the framework and its applications, but is also more efficient architecturally. The results shown here are very promising in such respect.
There are two specific areas for future work. Firstly, further performance analysis is required for database Web services, as well as an investigation into the reliability of Web services in N-tier web applications. The second would be to embed Web services directly into the query processor of a database system. This would allow binding to a Web service provider using an SQL statement to be automated, instead of having to implement a user-defined function.
[1] B. Reinwald, H. Pirahesh, G. Krishnamoorthy, G. Lapis, B. Tran, and S. Vora. "Heterogeneous Query Processing Through SQL Table Functions". In 15th International Conference on Data Engineering, pages 366--373, Sydney, March 1999.
[2] World Wide Web Consortium. Web Services Description Language (WSDL) 1.1 W3C Note 15 March 2001. [HREF6]
[3] Discovery Organisation for the Advancement of Structured Information Standards. Universal Description, Discovery and Integration (UDDI). [HREF7]
[4] Business Process Execution Language for Web Services, version 1.1, May 2003. [HREF8]
[5] F. Curbera, R. Khalef, N. Mukhi, S. Tai, and S. Weerawarana. "The next step in Web services". In Communications of the ACM, pages 29--34, Vol 46 Issue 10, October 2003.
[6] Q. Chen, M. Hsu, and V. Mehta. "How public conversation management integrated with local business process management". IEEE International Conference on E-Commerce, 2003. CEC 2003, pages 199 - 206.
[7] Ethan Cerami. Web Services Essentials. O'Reilly, Sebastapol, CA, 2002.
[8] World Wide Web Consortium: Simple Object Access Protocol (SOAP) 1.1. May 2000. [HREF9]
[9] Geetanjali Arora & Sai Kishore with NIIT. XML Web Services - Professional Projects. Premier Press, 2002.
[10] World Wide Web Consortium. XQuery 1.0: An XML Query Language. W3C Working Draft, 12 November 2003. [HREF10]
[11] Wolfgang Hoschek. "A Database for Dynamic Distributed Content and its Application for Service and Resource Discovery". In International IEEE Symposium on Parallel and Distributed Computing (ISPDC 2002), Iasi, Romania, July 2002.
[12] Wolfgang Hoschek. "A Unified Peer-to-Peer Database Framework for Scalable Service and Resource Discovery". Lecture Notes in Computer Science - Proceedings of the Third International Workshop on Grid Computing, pages 126-144, 2002.
[13] Robert A. van Engelen and Kyle A. Gallivan. "The gSOAP Toolkit for Web Services and Peer-To-Peer Computing Networks". In proceedings of IEEE CCGrid Conference 2002.