Interaction Protocols for Software Agents on the World Wide Web

Texas Advanced Technology Program - 1995


Research Objectives

Specific Goals

Relevance

As the World Wide Web [1] grows in scale and complexity, it will be increasingly difficult for end users to track information relevant to their interests. As the novelty of the Web fades, users will also increasingly demand efficient access to information. This proposal lays out the design and development of a knowledge-based interaction protocol between a user agent, an intermediary system with responsibility for tracking Web developments and relating to its user those which are relevant to the user's interests, and service agents, programs providing community-wide services on the Web [5].

The ease of construction and potential Internet-wide impact of autonomous software agents on the World Wide Web has spawned a great deal of discussion and occasional controversy. Based upon our experience in the Repository Based Software Engineering (RBSE) project with the design and operation of the RBSE Spider [4], such tools can create substantial value to users of the Web. Most current service agents involve Web spiders, which traverse the interlinked documents making up the Web, constructing an index of the information thus discovered. The difficulty with relying solely upon service agents to access the relevance of an artifact to a user's interest profile is that none of the service agents provide any persistence of state concerning what the user has already been presented with. Each user must periodically poll any given service agent with a query, and then filter that which is new from that which has already been seen.

A user agent's architecture should reflect the concerns of the individual, rather than the concerns of the community. We are currently constructing (with support from Texas Instruments) Sulla, a user agent that supports long-lived, goal-oriented Web activity. Our current approach to agent interaction entails Sulla mimicking the behavior of a human interacting with each service agent. This approach suffers from the ambiguities of natural language and the limitations of interaction through simplistic query interfaces.

Creating a knowledge representation scheme for agent interaction will allow for more sophisticated interactions between a user agent and the service agents to which it appeals for information. A preliminary approach to a representation scheme for such a protocol, based upon a thesaurus, is currently under develop for MORE, the RBSE project's Web-based repository system [6]. The work proposed here will extend this work with richer representations, and will be validated using Sulla, the RBSE Spider and MORE.

Related Work

Infoharness [11] is an open, extensible system designed to provide access to large amounts of heterogeneous information through encapsulation of these information resources in meta-data objects. The system architecture is comprised of a HTTP gateway, one or more InfoHarness servers, one or more InfoHarness collections, a meta-data generator which populates the collections, and a set of access tools (e.g., WAIS, relational databases, etc.). Users interact with the system through the gateway, which transforms requests into a form acceptable to the servers, which then act upon the request by returning portions of the meta-data, or by routing appropriate requests on to the access tool (which are responsible for manipulation of actual data).

Harvest [2, 3] supports ``gathering, indexing, caching, replicating, and accessing Internet information'' [2]. It was designed for scalability and customization through the separation of gatherers, responsible for the acquisition of information, and brokers, responsible for collection, index generation and dissemination of that information. Gatherers run at provider sites, and transmit information thus acquired back to one or more brokers using a ``summary object interchange format'' [2]. This allows for a significant reduction in network overhead when the transmitted information is heavily summarized or when there are many documents involved. Brokers interact with one or more gatherers for initial acquisition and with other brokers where useful to further filter information already collected by those brokers.

PAINT (Personalized Adaptive Internet Navigation Tool) [8] supports hierarchical hotlists in conjunction with Mosaic. This distinguishes it in that it is intended to support a single individual user, rather than a community of users. PAINT supports the creation of hierarchical clusters of Web resources as name spaces. The principle design goal was to simplify the comprehension of hotlist elements. Based upon the number of hotlist manipulation schemes springing up to support Mosaic, this is a significant problem for serious users of the Web.

The Lycos system [7] employs a Gnu DBM file to store the information discovered during its exploration. The information stored for a given document includes: the title, headings, the 100 most weighty words, the first 20 lines of the document and the size of the document, both in bytes and in words. The rationale behind these choices is the creation of a scheme that is finite in scope - the information concerning a document is not dependent upon the size of that document. Lycos caches the first twenty lines of the document for display as part of the results of a user search of the index, providing a limited context for the user without the need to access the matched documents.

WebCrawler [9] full-text indexes the documents encountered, operating with multiple retrieval agents in a server-breadth-first approach. The rationale behind the notion of a bread-first search with respect to servers rather than documents is that most servers currently have many related documents in a single subject area, rather than multiple subject areas. Skipping from server to server ensures broader coverage in results at the cost of requiring users to explore particular servers that seem to be relevant to see if they truly contain what is sought. Of course, subsequent passes by WebCrawler can reduce this coverage gap by eventually indexing the full set of documents on a given server.

Relationship with Our Other Projects

The RBSE Spider [4] retains both the structure of the Web as a graph representation in a relational database and a full text index of the HTML documents encountered. Searches can thus be specified either as SQL queries against the database, resulting in information concerning the nature of the Web itself, or against the full text index, resulting in information concerning the contents of documents that make up the Web. The full text index is currently supported through a slightly modified WAIS server.

The Multimedia Oriented Repository Environment (MORE) [6] operates in conjunction with a stock HTTP server to provide access to a relational database of meta-data. MORE provides separate hierarchies of meta-classes and collections and support for controlled access to proprietary collections through the definition of user groups. With the single exception of the system front page, the entire user interface is accomplished as dynamically generated HTML.

MORE is a meta-data based repository - the information stored in its underlying database is not the artifacts themselves, but rather information concerning the artifact, which is stored using other mechanisms (the file system, another database, or another software package such as a configuration management tool or CASE environment). The two distinct representation mechanisms allow a mix of homogeneous (through the class definition hierarchy) and heterogeneous information (through the collection hierarchy).

Sulla is a prototype user agent with the ability to acquire and retain an interest profile of its user and act upon one or more goals based upon that profile; the ability to act autonomously, pursuing the goals posed to it by its user, irrespective of whether the user is connected to the system where the agent is based; the ability to apprise its user of progress towards outstanding goals, and present preliminary results; and the ability to act ethically, exemplified by the guidelines proposed in [5], in particular, moderation in the acquisition of information during the satisfaction of a goal.

Sulla acts as a proxy, with the user employing an unmodified Web client from an arbitrary host to interact with the agent, which resides on a particular host (typically the user's desktop system). We are currently extending Sulla's current scope (HTML documents) with the ability to access a variety of information sources, both via direct access to those sources (e.g., HTML documents, FTP files, WAIS databases, articles posted to newsgroups, etc.) and those referenced by service agents.

Research Personnel

Dr. David Eichmann (Principal Investigator)

Eichmann is an Assistant Professor of Software Engineering at UHCL and Director of Research and Development for the Repository Based Software Engineering program, a NASA-funded multi-year project in reuse and reengineering of large software systems. He is also principal investigator on the Sulla project, an industry-funded project developing a user agent for the World Wide Web. The combined projects involve three faculty, three full-time research staff, and ten graduate research assistants.

Eichmann will be responsible for management of the project as a whole, the general architectural design of the agent protocols, and principal authoring of the publications concerning the project and commercialization of the prototypes.

Michael Weisskopf (Senior Research Associate)

Weisskopf was recently promoted from Programmer Analyst to Senior Research Associate on the Repository Based Software Engineering program. His responsibilities there include coordination of graduate research assistants under his direction, independent interaction with RBSE collaborators (including NASA/JSC, Rockwell Space Operations, Unisys and Loral) and experimental deployment of World Wide Web environments developed by the RBSE Research and Development team.

Weisskopf will be responsible for on-going coordination of the graduate students on the project, interaction with the principal investigator, and participation in the design and evaluation of the prototypes as they are developed.

Graduate Student Research Assistants

Three graduate students will be recruited from the Software Engineering and/or Computer Science Master's programs at UHCL. These students will each be assigned a prototype and will be responsible for participating in the design of the interaction protocols and responsible for the implementation and testing of those protocols within their assigned prototype.

Methodology

Knowledge Representation

There are few practical information retrieval systems that operate with anything other than a simple textual representation of a user query. Recent work in the area of knowledge engineering has led to significant, reusable environments for the construction of ontologies - models of the world that comprehend the relationships and terminologies used by humans in their reasoning and discourse.

Representing such an ontology, and relating one system's ontology to another system's ontology is the critical factor for moving beyond ambiguous natural language as a means for users or agents to share information and goals. The ARPA Knowledge Sharing project has generated a prototype for just such a representation in KIF, the Knowledge Interchange Format. We intend to assess KIF for its strengths and implement a limited version that is focussed directly on our particular problem domain.

Protocol Definition

Given a knowledge representation scheme, user-agent and agent-agent interaction requires the ability to construct a fragment of an ontological context through which a comparison can be made between the sought after goals and the information present within a given agent's knowledge base. We will define an encoding of this representation fragment as the query portion of a Universal Resource Locator (URL). This encoding will allow the layering of the agent protocol on the top of the existing HyperText Transport Protocol (HTTP), avoiding the need to define and implement a system/server to system/server communications mechanism.

Prototyping

Once we have the knowledge representation and transferral schemes completed, each of our three prototypes (Sulla, Spider and MORE) will be extended to support the new representations to improve the nature of their interactions. These enhancements can easily be carried out in parallel once a shared implementation of the knowledge representation library has been completed. Furthermore, testing interactions based upon the enhancements will require only pair-wise integration. This approach provides scalability, as each of the prototypes is representative of a distinct class of agent or provider system.

Schedule

We have structured the proposal as a two year effort, but plan distinct annual milestones for evaluation and feedback (durations given are from initiation of project). Work on design will involve the team as a whole. Work on prototypes will proceed in parallel, with pair-wise integration testing occurring as the appropriate prototypes become available:

Technology Transfer

As part of our agreement with Texas Instruments on the Sulla project, both Texas Instruments and UHCL have the right to pursue further development and commercialization of Sulla following the completion of that project. We intend to provide Texas Instruments with the agent protocols developed here as part of that agreement. We are already in negotiation with a group seeking to create an `Internet village' franchise concept for licensing of the full suite of prototypes as well as funding to customize our work to their application. We anticpate that the work proposed here will become part of that arrangement, as well. Finally, we intend to provide an experimental service on the Web (comparable to that we already offer with the Spider), allowing the Internet community as a whole to evaluate our approach and provide feedback.

Institutional Commitment and Sources of Additional Support

The Research Institute for Computing and Information Systems (RICIS) at the University of Houston - Clear Lake conducts research in a broad swath of computing related topics. The principal investigator is an active member of RICIS and hence has access to the full resources of the institute. The RBSE project operates a large server in support of the research and development group, including multiple Web servers, multiple instances of MORE, and the Spider. The work proposed here extends those projects into domains, that while beyond the scope of our current NASA funding, offer great potential to attract additional support. The Texas Instruments grant for the Sulla user agent, for instance, derived from interest expressed by their Corporate R&D group in our work on the Spider. We plan on leveraging the proposed work into project proposals to NSF, ARPA, and select industrial collaborators. NSF recently released a workshop report detailing a research agenda for the Web within which our work fits well.

Impact on Infrastructure of Science and Engineering

The demand for migration of information onto the Web has far outstripped the possibility to hand-craft such resources. An entirely new information-based industry is rising from the research networks of ARPA and NSF. The work proposed here both educates graduate students in the design and construction of Web information systems and provides the technology to spin off for commercialization. The RBSE research group over the last few years has trained dozens of graduate students who upon graduation have taken positions in the local aerospace community as well as in Houston's growing commercial software industry. The work proposed here will continue that trend.

Bibliography

  1. Berners-Lee, T., R. Cailliau, A. Loutonen, H. F. Nielsen and A. Secret, ``The World-Wide Web,'' Communications of the ACM, v. 37, n. 8, August 1994, p. 76-82.
  2. Bowman, C. Mic, Peter B. Danzig, Darren R. Hardy, Udi Manber and Michael F. Schwartz, ``The Harvest Information Discovery and Access System,'' Proceedings of the Second International Conference on the World Wide Web, Chicago, IL, October 19-21, 1994.
  3. Bowman, C. Mic, Peter B. Danzig, Darren R. Hardy, Udi Manber and Michael F. Schwartz, Harvest: A Scalable, Customizable Discovery and Access System, University of Colorado, Boulder, CO, August 26, 1994.
  4. Eichmann, D. ``The RBSE Spider - Balancing Effective Search Against Web Load,'' First International Conference on the World Wide Web, Geneva, Switzerland, May 25-27, 1994, pages 113-120.
  5. Eichmann, D., ``Ethical Web Agents,'' Proc. Second International World-Wide Web Conference: Mosaic and the Web, Chicago, IL, October 17-20, 1994, pages 3-13.
  6. Eichmann, D., T. McGregor and D. Danley, ``Integrating Structured Databases Into the Web: The MORE System,'' Computer Networks and ISDN Systems, v. 24, n. 2, 1994.
  7. Mauldin, M. L. and J. R. R. Leavitt, ``Web Agent Related Research at the Center for Machine Translation,'' Proceedings of the ACM Special Interest Group on Networked Information Discovery and Retrieval (SIGNIDR-94), August 1994
  8. Oostendorp, K. A., W. F. Punch and R. W. Wiggins, ``A Tool for Individualizing the Web,'' Proceedings of the Second International Conference on the World Wide Web, Chicago, IL, October 19-21, 1994.
  9. Pinkerton, B., ``Finding What People Want: Experiences with the WebCrawler,'' Proceedings of the Second International Conference on the World Wide Web, Chicago, IL, October 19-21, 1994.
  10. Riecken, D., ``Intelligent Agents: Introduction to Special Issue,'' Communications of the ACM, v. 37, n. 7, July 1994, p. 18-21.
  11. Shklar, Leon, Satish Thatte, Howard Marcus and Amit Sheth, ``The ``Infoharness'' Information Integration Platform,'' Proceedings of the Second International Conference on the World Wide Web, Chicago, IL, October 19-21, 1994.

Last Modified: November 29, 1995