Ethical Web Agents(1)
David EichmannRepository Based Software
Engineering Program
Research Institute for Computing and Information
Science
University of Houston -- Clear Lake
2700 Bay Area
Boulevard
Houston, TX 77058
eichmann@rbse.jsc.nasa.gov
As the Web continues to evolve, the sophistication of the programs
that are employed in interacting with it will also increase in
sophistication. Web agents, programs acting autonomously on some task, are
already present in the form of spiders. Agents offer substantial benefits
and hazards, and because of this, their development must involve not only
attention to technical details, but also the ethical concerns relating to
their resulting impact. These ethical concerns will differ for agents
employed in the creation of a service and agents acting on behalf of a
specific individual. An ethic is proposed that addresses both of these
perspectives. The proposal is predicated on the assumption that agents are
a reality on the Web, and that there are no reasonable means of preventing
their proliferation.
The ease of construction and potential Internet-wide impact of autonomous
software agents on the World Wide Web [1] has spawned a great deal of discussion
and occasional controversy. Based upon our experience in the design and
operation of the RBSE Spider [5], such
tools can create substantial value to users of the Web. Unfortunately,
agents can also be pests, generating substantial loads on already
overloaded servers, and generally increasing Internet backbone traffic.
Much of the discussion to date has been directed towards this single
perspective, the impact of an agent (and indirectly, the operator of that
agent) on the Web. Attention more recently has turned to the impact that
an agent can have upon its operator - both positively, when needed
information resources are ferreted out, and negatively, when an agent's
actions results in mail-bombings and (in the near future) substantial
financial charges for its operator.
This paper addresses our work in building a spider that is both a good Web
citizen and a provider of a generally-useful resource, and how such
approaches can be scaled to the increasingly massive information
infrastructure of the Web. I contrast this approach with the
provider-focussed approach of systems such as ALIWEB [17] and the agents hinted at in such
venues as the recent television commercials by AT&T. I conclude with a
proposed architecture to support ethical behavior while still allowing
operator-defined search in the rapidly evolving Web.
This paper's motivation is the confluence of what was until recently, two
distinct threads of research, development of intelligent software agents
and development of hypermedia systems (and more specifically, distributed
hypermedia systems). This confluence was first exhibited with the
emergence of Web spiders(2) in 1993, when
the Web grew to a sufficient size as to be interesting to study as an
artifact in its own right. A brief review of recent work in software
agents and Web spiders will be useful in setting a context for later
sections of the paper.
An agent is a program which interacts with and assists an end user.
Research in this area has been driven primarily by the artificial
intelligence research community, since an unintelligent agent is not of
much use. The special issue of Communications of the ACM edited by
Riecken [30] provides a good overview
of the area. Agent research presents two useful perspectives for a
discussion of ethics.
The first concerns the nature of interaction between agents themselves.
Genesereth and Ketchpel [12] present an
overview of the Knowledge Interchange Format (KIF) and the Knowledge Query
and Manipulation Language (KQML), developed as part of the ARPA Knowledge
Sharing Effort. Guha and Lenat [13]
describe the Cyc project's evolution from an emphasis on addressing
brittleness in expert systems to an emphasis on assisting with information
management.
The second perspective concerns the nature of interaction between an agent
and an end user. Kautz, et. al. [16]
focus on what they refer to as "mundane tasks - setting up meetings,
sending out papers, locating information in multiple databases, tracking
the whereabouts of people, and so on." They distinguish between
userbots (mediators for a specific user) and taskbots (responsible for
carrying out a specific task). Maes [22] focuses on the reduction of
information overload though agents that are trained using machine learning
techniques to handle electronic mail, news filtering and meeting
scheduling.
AT&T's recent television commercial involving a animated dog posting a
note on a computer screen and responding to voice commands (including
praise from its presumed owner!) is the natural (albeit still fictional)
extension of this work. Norman [28]
offers a reflective perspective on how the field might evolve.
A Web spider is a program that autonomously explores the structure
of the Web and takes some action upon the artifacts thereby encountered.
This action might be as simple as counting the number of artifacts found,
or as complex as a full text indexing of the contents of the artifact.
Given the relative ease with which a spider can be constructed, it
actually somewhat surprising that there are only twenty-odd spiders
documented to date [19]. A
representative sample of research work in spiders follows, refer to [7] for a more complete survey.
The RBSE Spider [5] retains both the
structure of the Web as a graph representation in a relational database
and a full text index of the HTML documents encountered. Searches can be
specified either as SQL queries against the relations, supporting
information such as is displayed in Figures 1 and 2, or against the full text index,
providing relevance-ranked results. McBryan's World Wide Web Worm (WWWW)
[23] retains a more limited information
base, comprised of the document's title and contained anchor information.
Search is limited to scanning individual records using pattern matching.
Fielding's MOMspider [10] was designed
as a maintenance tool for large webs of HTML documents. It reaches out
only to validate the existence of a document corresponding to the URLs
found in documents appearing in its maintenance list.
The fish search mechanism of De Bra and Post [3] is a completely distinct category,
based upon an executing instance of a modified Mosaic, and supporting a
spreading activation of URL retrievals similar in behavior to schooling
fish (hence the name). Fish search results are transient, available only
to the specific Mosaic user and only for the duration of that execution of
Mosaic.
There are distinct benefits to be had in Web spiders, the most obvious of
which is improvement in user satisfaction through effective search of the
Web -- its scale has completely outstripped an individual's ability to
assess and comprehend it. The problem has become similar to that
experienced by users of anonymous FTP. Archie arose as a response to the
increasing difficulty with locating specific software packages among
hundreds of FTP servers. Search-directed access has the potential to
reduce traffic by reducing revisitation and casual browsing to "see
what's there." Whether this potential will be realized is an open
question. Spiders also offer the opportunity to support archivists in the
construction of virtual neighborhoods of information. Currently archivists
are dependent upon the suggestions of users of the archives or providers
who volunteer a description of their information.
Poorly designed spiders can severely impact both overall network
performance and the performance of the servers that it accesses. Many of
the known spiders fail to control document retrieval rates, repetition of
requests or the retrieval of low-value artifacts (e.g., GIFs). Authors of
new spiders frequently fail to address the question of infinite regress,
whether in the form of cycles in the graph (obvious) or in the form of
dynamic documents (not so obvious). Many of the early attempts at spider
construction resulted in information sinks, where information flowed in
and little or no information flowed out (frequently, all that flowed out
was a simple metric, such as x servers located, or y
documents retrieved). Much of this was due to a mixture of naivete and the
cloak of anonymity. As providers shifted from pride over access counts to
load management, peer pressure and education limited the more egregious
cases.
Hypertext browsing is easily argued as one of the principle strengths and
reasons for success of the Web. The Web has demonstrated a practical
realization of much of the theoretical work in hypertext. At the same
time, there were key aspects of the Web's technological
infrastructure missing at its inception, the most critical of which was
support for indexing.(3) McLeese [25] argues that browsing is central to
effective hypertext -- making a distinction between navigation and
browsing -- but also observes that a variety of tools are required to most
effectively select where to go next within hypermedia. This isn't to say
that browsing is not central to hypertext in general and the Web in
particular.(4) For example, Campagnoni and
Ehrlich [2] found that users of a
hierarchical hypertext preferred browsing over indices for navigation.
It's instead my claim that it is time for designers and providers of Web
services and infrastructure to look to the literature for empirical
evidence of what works and what doesn't, and act accordingly.
The important question to ask here is not "what can be built in such
a manner as to limit resource consumption?" which is the refrain of
opponents of even the concept of Web agents, but rather "what
can be built in order to make usage of the Web more effective?" The
explosive growth of the Web(5) is making it
increasingly difficult to accomplish an effective search for information
in a specific area. Effective is defined here as finding everything that
is relevant to the search and not finding things that are not relevant --
the traditional sense of the word in library science. Multiple access
modes are becoming a necessity as the Web scales up.
Navigation itself has been demonstrated to have distinct modes. Monk [26] distinguishes between directed
navigation, where the user traverses a known path to a known artifact, and
exploratory navigation, where the user is attempting discovery of
previously unknown artifacts. His personal browser foreshadows the
Mosaic hotlist facility as a facility supporting directed navigation.
Frisse and Cousins [11] distinguish
between local and global navigation, and claim that global
navigation requires an index space distinct from the document space. This
is a key distinction in relating traditional results in hypermedia
research, where the corpus resided on a single host, and the Web, where
the basic premise is massive decentralization.
The nature of the artifacts themselves can color user preferences for
access mechanisms. Dillon, et.al. [4]
and Van Dyke Parunak [32] comment upon
the likelihood of disorientation as nonlinearity increases. Wright and
Lickorish [34], comparing an
index-based navigation scheme with one more hierarchical in nature, found
readers preferred the latter for book-like text, but the former for more
modular information.
Just what is the structure of the Web? One of the rationales for the
separation in the RBSE Spider [5] of
the discovery and storage of Web structure from the indexing and retrieval
of the HTML documents thereby retrieved was the ability to generate
queries concerning the characteristics of the Web itself. The February
1994 run of the RBSE Spider produced a graph comprised of approximately
36,000 HTML documents, 62,000 distinct target artifacts and 182,000 total
edges. As can be seen in Figures 1 and
2, the modularity of the Web is
extremely high. While the number of hyperlinks contained within a given
document is fairly high (>10 links per document in our index), only a
third of those hyperlinks are to other documents (~ 3.5 inter-document
links per document in our index), and documents so referenced are referred
to in general by only a few other documents (59% of the documents have a
single inbound link and 96% have five or fewer). There are, of course,
notable exceptions in the data:

Figure 1: External URL in documents

Figure 2: Inbound URL references to documents
- pruned from the right side of Figure 1 are two documents, with 3469 and 1184
outbound hyperlinks, respectively;
- most points pruned from the top of Figure 1 are in the range of 400 or fewer
documents, but there are 12,398 documents with precisely 5 links (the
spider mapped an on-line thesaurus with a very regular structure); and
- pruned from the right side of Figure 2 are 4 documents with approximately
12,000 inbound links (we'll leave to the reader just which URLs these
might be... ).
Clearly, if this is a representative sample of the web, having one-half of
the artifacts reachable from a single other document in the web implies
that much of this traffic is likely to be redundant -- users must wander
from node to node to discover information, and frequently revisit nodes in
doing so.
The concept of an ethic for what is basically an information system might
at first seem strange. In fact, we have an excellent example of how a
virtual community evolves a set of ethics in a setting of potentially
complete chaos -- Usenet news. The readership has evolved over the years a
consensus as to how newsgroups come into existence, and what behavior is
appropriate within specific groups. The only true constraints are those
generated by peer pressure on the offending user and their system
administrator. This section begins with a review of the consensus that has
arisen amongst participating spider authors and resource providers,
describes a similar manifesto originating in the intelligent software
agent community, and concludes with a proposal for a web ethic.
Koster's guidelines [18], authored as a
means of addressing the increasing load on his server by spiders, was the
first wide-ranging attempt at Web ethics. It served as a basis for
discussion amongst spider authors and resource providers, and in
conjunction with his "list of robots" [19], began to create the first community
pressure on spider authors to act ethically. Briefly, the guidelines
entail:
- reconsider... (is another spider really needed?);
- identify the spider, sources of additional information, and
yourself;
- test locally;
- moderate your speed (within a single run) and frequency (between runs)
of access to any given host;
- retrieve only what you can handle - both in format and in scale;
- monitor your runs (there are "black holes" out there!);
and
- share your results.
The guidelines were not intended to suppress legitimate research or
resource discovery done as a part of the creation of an information
service, but rather were intended to stem the tide of "because it's
there" spider implementations. As mentioned above, the result has
been an emerging consensus on appropriate Web behavior that is still
operating reasonably well.
The difficulty with the guidelines is that they still don't provide a
means for an information provider to indicate to a running spider that
portions or even the entire file space of their server should be
off-limits. The robot exclusion standard [20] was defined to address just this. The
scheme entails the creation of a file on the server with a standard path
(/robots.txt) and contents detailing the nature of the desired
constraints. For example, the definition appearing in Figure 3 indicates that all spiders should avoid
this server, with the exception of alpha and beta, which are granted
access to /private, and gamma, which is granted complete access. Further
details are available in [20]. Note
that while a provider can specify as many constraints on spiders as wished
through an exclusion definition, it is still up to the spider itself to
check for the existence of the file and adhere to its constraints.

Figure 3: A Sample /robots.txt Definition
Etzioni, et.al. define a softbot as
"an agent that interacts with a software environment by issuing
commands and interpreting the environment's feedback. ... [a] softbot's
sensors are commands meant to provide the softbot with information about
the environment... Due to the dynamic nature and sheer size of real-world
software environments it is impossible to provide the softbot with a
complete and correct model of its environment... "[8]
envisioning a construct very similar to a Web agent. Softbots also effect
change on their environment, leading to the formulation of a collection of
softbotic laws (intentionally derivative of Asimov's laws of robotics) [9, 33]:
- Safety -- The softbot should not destructively alter the
world.
- Tidiness -- The softbot should leave the world as it found
it.
- Thrift -- The softbot should limit its consumption of scarce
resources.
- Vigilance -- The softbot should not allow client actions with
unanticipated results.
Any scheme of ethics for Web agents must address the motivation of the
users who employ these agents, of the individuals that author agents and
the providers of information resources that the agents access. Users are
seeking guidance and organization in a chaotic, dynamic information
framework. They are in a process of exploration when using the results of
agents, since other mechanisms (i.e., hotlists and personal pages) serve
only the need for directed navigation. Web agent authors respond to this
need with a service that, if we believe the feedback received, far
outweighs the problems created by their progeny. Even those authors who
act in isolation are in a mode of exploration - the concept of "power
user" carried to the extreme, where the entire world is at their
fingertips. Information providers are interested in dissemination of their
artifacts (why else publish them?). The issue is hence one of striking an
appropriate balance between interests and concerns - accessibility for the
individual balanced against accessibility for the community.
The guidelines/exclusion perspective and the softbotics perspective offer
similar approaches to attempting such a balance. However, neither operates
from assumptions that match completely with what the Web promises to
become.
- The Web is a distributed information resource, and because of this,
much of the research results available relating to more traditional
hypermedia is not directly applicable.
- The Web is highly dynamic, and will remain so for the foreseeable
future.
- Agents are too valuable a user resource given the chaotic nature of
the Web for them not to be employed, even if their usage is forced into
simulation of user behavior in order to avoid detection.
- Commercial offerings of spider-generated information services will
appear in the near future. (When was the last time that you took a
really close look at your log files?)
Factoring agent functionality into smaller categories offers a useful
means of examining the issues. As shown in Figure 4, these categories include user-agent
interaction, agent-agent interaction, and agent-server interaction. I
specifically exclude user - server interaction, as this is the normal mode
of activity on the Web, and doesn't involve agents. Koster's ALIWEB scheme
[17] entails agent - server interaction
in its accretion mode, when the server extracts current index information
from participating servers and forms its aggregate index, and user - agent
interaction when a user employs a Web client to search the index. Kahle's
Wide Area Information Server (WAIS) system [15], entails a degenerate form of agent -
server interaction when a server generates a local index for a collection
of artifacts, a meta form of agent - agent interaction when a server posts
a description of itself to a directory of servers, and two forms of user -
agent interaction - one at a meta level then interrogating the directory
of servers for information about servers and one at a normal level of
interrogating a server about its contents. Other approaches include that
of McKee [24], where a WAIS index is
generated for the documents available from a particular server (another
degenerate form of agent - server interaction), and the MORE system [6], which supports localized searching of
metadata about artifacts residing anywhere in the Web, where two forms of
user - agent interaction take place - one by the librarians posting
information to the repository concerning already known artifacts, and one
by the users interrogating the repository for referrals to specific
artifacts on specific servers.

Figure 4: Categories of Agent Functionality
The spiders discussed so far fall into the general category of task
agents - accomplishing a specific task and generating a result for
subsequent multiple uses. The single exception to this is the fish search
mechanism, which is readily identifiable as a user agent. An architecture
that is capable of scaling to millions of artifacts on tens of thousands
of servers accessed by millions of users requires aspects of both of these
categories. Because of this, I propose a separation of design concerns
into those which relate to service agents - those interacting with servers
on the Web in the formation of information bases (thereby making
themselves servers in their own right), and user agents - those
interacting with servers on the Web in direct support of a particular
individual.
A service agent should adhere to the following ethical guidelines:
- Identity - a service agent's activities should be readily
discernible and traceable back to its operator.
- Openness - the information generated by a service agent should
be generally accessible to a community whose size relates directly to the
scope of the agent's activities.
- Moderation - the pace and frequency of information acquisition
should be appropriate for the capacity of the server and the network
connections lying between the agent and that server.
- Respect - a service agent should respect the constraints placed
upon it by server operators.
- Authority - a service agent's services should be accurate and
up-to-date.
Clearly a balance must be struck between the concerns of openness,
moderation and respect, which limit a service agent's scope and
activities, and the concern of authority which broadens them. A service
agent that is not authoritative will not be employed, but one that is
renegade [29] is as damaging to the Web
as the impact avoided by the community it supports.
A user agent should adhere to the following ethical guidelines:
- Identity - a user agent's activities should be readily
discernible and traceable back to its user.
- Moderation - the pace and frequency of information acquisition
should be appropriate for the capacity of the server and the network
connections lying between the agent and that server.
- Appropriateness - a user agent should pose the proper questions
to the proper servers, relying upon service agents for support regarding
global information and servers for support for local information.
- Vigilance - The user agent should not allow user requests to
generate unanticipated consequences.
The current state of the Web makes the goal of appropriateness difficult
to attain for a user agent. The service agents in existence at the current
time are for the most part experimental, and hence still struggling with
their own issues, particularly in the areas of moderation, respect and
authority. Vigilance is a concern that has not yet even begun to generate
the impact that it will. As commercial providers become an increasing
presence on the Web and evolve towards services with strong authority,
limiting the impact to a user of their agent's activity against
fee-for-service agents will become a critical concern. Pottmyer has
proposed some initial approaches to address this area [29].
While it's easy to argue that there really are not any resolved
issues concerning Web ethics at this point, there are some key areas as
yet unaddressed that deserve mention:
- How do we construct virtual neighborhoods of information? If much of
the network traffic generated currently is by users in only partially
successful search for information, generating stronger bindings of
conceptually related artifacts should reduce that traffic.
- How do we support user comprehension of the existence of and access to
these virtual neighborhoods? The "flying" mechanism of Lai and
Manber [21] offers a novel means of
generating an overall sense of organization of a hypertext, similar to
flipping through the pages of a book. This type of visualization mechanism
might assist in the formulation of users' mental models of the Web.
- Pottmyer in referring to the "organic nature" of the
Internet [29] mentions cooperation for
the good of the organism. The Web has a great deal of user level
cooperation underway, and is predicated upon the assumption of server
level cooperation. Agent research should address this issue as well.
Text-based indices are not a panacea for the Web. Monk observes that
keyword-based indices exhibit similar problems to browsers with
sufficiently large hypertexts [27].
Future work by the RBSE research group will include:
- Use of the spider to construct indices of sub-webs based on number of
transitive hyperlinks from a starting point (result is an index of a
connected subgraph);
- Use of the spider to sweep the Web identifying artifacts matching a
domain profile and indexing only the matches (result is an index of a
(potentially) disconnected subgraph); and
- Temporal studies of the Web, allowing assessment of mutation rates and
fine-grained growth patterns.
The resulting knowledge will be used to modify the spider for more
intelligent access patterns, integration of exploration results into the
MORE storage scheme, and definition of a conceptual (as opposed to
keyword) search engine.
- [1]
- Berners-Lee, T., R. Cailliau, A.
Loutonen, H. F. Nielsen and A. Secret, "The World-Wide Web,"
Communications of the ACM, v. 37, n. 8, August 1994, p. 76-82.
- [2]
- Campagnoni, F. R. and K. Ehrlich,
"Information Retrieval Using a Hypertext-Based Help System," ACM
Transactions on Information Systems, v. 7, n. 3, 1989, p. 271-291.
- [3]
- De Bra, P. M. E. and R. D. J. Post,
"Information Retrieval in the World-Wide Web: Making Client-based
searching feasible," First International Conference on the World
Wide Web, Geneva, Switzerland, May 25-27, 1994, p. 137-146.
- [4]
- Dillon, A., C. McKnight and J.
Richardson, "Navigation in Hypertext: A Critical Review of
Concept," Proc. of IFIP INTERACT `90: Human-Computer Interaction,
Detailed Design: Hypermedia, 1990, p. 587-592.
- [5]
- Eichmann, D. "The RBSE Spider --
Balancing Effective Search Against Web Load," First International
Conference on the World Wide Web, Geneva, Switzerland, May 25-27, 1994,
p. 113-120.
- [6]
- Eichmann, D., T. McGregor and D.
Danley, "Integrating Structured Databases Into the Web: The MORE
System," Computer Networks and ISDN Systems, v. 24, n. 2, 1994.
- [7]
- Eichmann, D., "Advances in
Network Information Discovery and Retrieval," submitted to the
International Journal of Software Engineering and Knowledge
Engineering.
- [8]
- Etzioni, O., N. Lesh and R. Segal,
Building Softbots for UNIX (Preliminary Report), University of
Washington, Seattle, WA, November 1992.
- [9]
- Etzioni, O. and D. Weld, "A
Softbot-Based Interface to the Internet," Communications of the
ACM, v. 37, n. 7, July 1994, p. 72-76.
- [10]
- Fielding, R. T., "Maintaining
Distributed Hypertext Infostructures: Welcome to MOMspider's Web,"
First International Conference on the World Wide Web, Geneva,
Switzerland, May 25-27, 1994, p. 147-156.
- [11]
- Frisse, M. E. and S. B. Cousins,
"Information Retrieval from Hypertext: Update on the Dynamic Medical
Handbook Project," ACM Hypertext `89, Information Retrieval I,
1989, p. 199-212.
- [12]
- Genesereth, M. R. and S. P.
Ketchpel, "Software Agents," Communications of the ACM, v.
37, n. 7, July 1994, p. 48-53,147.
- [13]
- Guha, R. V. and D. B. Lenat,
"Enabling Agents to Work Together," Communications of the
ACM, v. 37, n. 7, July 1994, p. 126-142.
- [14]
- Hayes, B., "The World Wide
Web," American Scientist, v. 82, September-October, 1994, p.
416-420
- [15]
- Kahle, B., Wide Area Information
Server Concepts, Thinking Machines Inc., November 1989.
- [16]
- Kautz, H. A., B. Selman and M. Coen,
"Bottom-Up Design of Software Agents," Communications of the
ACM, v. 37, n. 7, July 1994, p. 143-147.
- [17]
- Koster, M., "ALIWEB --
Archie-Like Indexing in the WEB," First International Conference
on the World Wide Web, Geneva, Switzerland, May 25-27, 1994, p.
91-100.
- [18]
- Koster, M., "Guide for
Robot Writers," Nexor Corp.,
http://web.nexor.co.uk/mak/doc/robots/guidelines.html.
- [19]
- Koster, M., "List of
Robots," Nexor Corp.,
http://web.nexor.co.uk/mak/doc/robots/active.html.
- [20]
- Koster, M., "A Standard for
Robot Exclusion," Nexor Corp.,
http://web.nexor.co.uk/mak/doc/robots/norobots.html.
- [21]
- Lai, P. and U. Manber, "Flying
Through Hypertext," ACM Hypertext `91, Presentation Issues, 1991,
p. 123-132.
- [22]
- Maes, P., "Agents that Reduce
Work and Information Overload," Communications of the ACM, v. 37,
n. 7, July 1994, p. 30-40,146.
- [23]
- McBryan, O. A., "GENVL and
WWWW: Tools for Taming the Web," First International Conference on
the World Wide Web, Geneva, Switzerland, May 25-27, 1994, p. 79-90.
- [24]
- McKee, D., "Towards Better
Integration of Dynamic Search Technology and the World-Wide Web,"
First International Conference on the World Wide Web, Geneva,
Switzerland, May 25-27, 1994, p. 129-135.
- [25]
- McLeese, R., "Navigation and
Browsing in Hypertext," HYPERTEXT I: Theory into Practice, 1988,
p. 6-44.
- [26]
- Monk, A., "The Personal
Browser: A Tool for Directed Navigation in Hypertext Systems,"
Interacting with Computers, v. 1, n. 2, 1989, p. 190-196.
- [27]
- Monk, A. F., "Getting to Known
Locations in a Hypertext," HYPERTEXT II: State of the Art,
Navigation and Browsing, 1989, p. 20-27.
- [28]
- Norman, D. A., "How Might
People Interact with Agents," Communications of the ACM, v. 37, n.
7, July 1994, p. 68-71.
- [29]
- Pottmyer, J., "Renegade Intelligent
Agents," SIGNIDR V - Proc. Special Interest Group on Networked
Information Discovery and Retrieval, McLean, VA, August 4, 1994.
Presentation slides available as
http://wyww.wais.com/SIGNIDR/Proceedings/SA3/.
- [30]
- Riecken, D., "Intelligent
Agents: Introduction to Special Issue," Communications of the ACM,
v. 37, n. 7, July 1994, p. 18-21.
- [31]
- Schatz, B. R. and J. B. Hardin,
"NCSA Mosaic and the World Wide Web: Global Hypermedia Protocols for
the Internet," Science, v. 265, August 12,1994, p. 895-901.
- [32]
- Van Dyke Parunak, H.,
"Hypermedia Topologies and User Navigation," ACM Hypertext
`89, Navigation in Context, 1989, p. 43-50.
- [33]
- Weld, D. and O. Etzioni, "The
First Law of Robotics (a call to arms)," Proc. of the 12th
National Conference on AI, Seattle, WA, July 13 - August 4, 1994.
- [34]
- Wright, P. and A. Lickorish,
"An Empirical Comparison of Two Navigation Systems for Two
Hypertexts," HYPERTEXT II: State of the Art, Navigation and
Browsing, 1989, p. 84-93.
David Eichmann is an assistant professor of software engineering at the
University of Houston - Clear Lake and director of research and
development of the Repository Based Software Engineering Program. Besides
normal academic duties, his responsibilities include management of a
research and development group working in the areas of reuse repositories
and reengineering. He joined the UHCL software engineering faculty in
August of 1993 after visiting for a year in his role with RBSE. He
previously held positions at West Virginia University, where he lead the
Software Reuse Repository Lab (SoRReL) group, and at Seattle University.
Email: eichmann@rbse.jsc.nasa.gov
Footnotes
- (1)
- This work has
been supported by NASA Cooperative Agreements NCC-9-16 and NCC-9-30, RICIS research
activity RB02.
- (2)
- A number of terms are used
for programs which autonomously navigate Web structure: wanderers, robots,
worms, even fish. My use of spider is intended to include all of
these.
- (3)
- That is, most critical from the
perspective of this paper's focus. Other key aspects relating, for
example, to expressiveness - support for tables and equations in HTML,
etc. are not relevant to the discussion here.
- (4)
- The requirement for navigation is frequently cited as
being, in part, cause of the demise of network database systems and
non-navigational access has been, in part, the success of relational
database systems.
- (5)
- Just how explosive is
reflected in the fact that articles appearing in such generic venues as
American Scientist [14] and
Science [31] discuss the
phenomenon of the Web as much as they do the technology.