The Invisible Web: Uncovering Information Sources Search Engines Can't See - Softcover

Sherman, Chris; Price, Gary

 
9780910965514: The Invisible Web: Uncovering Information Sources Search Engines Can't See

Inhaltsangabe

Enormous expanses of the Internet are unreachable with standard web search engines. This book provides the key to finding these hidden resources by identifying how to uncover and use invisible web resources. Mapping the invisible Web, when and how to use it, assessing the validity of the information, and the future of Web searching are topics covered in detail. Only 16 percent of Net-based information can be located using a general search engine. The other 84 percent is what is referred to as the invisible Web—made up of information stored in databases. Unlike pages on the visible Web, information in databases is generally inaccessible to the software spiders and crawlers that compile search engine indexes. As Web technology improves, more and more information is being stored in databases that feed into dynamically generated Web pages. The tips provided in this resource will ensure that those databases are exposed and Net-based research will be conducted in the most thorough and effective manner.

Die Inhaltsangabe kann sich auf eine andere Ausgabe dieses Titels beziehen.

Über die Autorin bzw. den Autor

Gary Price is a reference librarian at George Washington University. He lives in Vienna, Virginia. Chris Sherman is the director of the guide to Web searching on About.com and president of Searchwise, a consulting firm. He is the author of the CD-ROM Handbook and a frequent contributor to Online magazine. He lives in Los Angeles. Danny Sullivan works for SearchEngineWatch.com.

Auszug. © Genehmigter Nachdruck. Alle Rechte vorbehalten.

The Invisible Web

Uncovering Information Sources Search Engines Can't See

By Chris Sherman, Gary Price

Information Today, Inc.

Copyright © 2001 Chris Sherman and Gary Price
All rights reserved.
ISBN: 978-0-910965-51-4

Contents

Copyright,
Dedication,
Figures Tables,
Foreword,
Acknowledgments,
Introduction,
About www.invisible-web.net,
Chapter 1 — The Internet and the Visible Web,
Chapter 2 — Information Seeking on the Visible Web,
Chapter 3 — Specialized and Hybrid Search Tools,
Chapter 4 — The Invisible Web,
Chapter 5 — Visible or Invisible?,
Chapter 6 — Using the Invisible Web,
Chapter 7 — Case Studies,
Chapter 8 — The Future: Revealing the Invisible Web,
Chapter 9 — The Best of the Invisible Web,
Chapter 10 — Art and Architecture,
Chapter 11 — Bibliographies and Library Catalogs,
Chapter 12 — Business and Investing,
Chapter 13 — Computers and Internet,
Chapter 14 — Education,
Chapter 15 — Entertainment,
Chapter 16 — Government Information and Data,
Chapter 17 — Health and Medical Information,
Chapter 18 — U.S. and World History,
Chapter 19 — Legal and Criminal Resources,
Chapter 20 — News and Current Events,
Chapter 21 — Searching for People,
Chapter 22 — Public Records,
Chapter 23 — Real-Time Information,
Chapter 24 — Reference,
Chapter 25 — Science,
Chapter 26 — Social Sciences,
Chapter 27 — Transportation,
Glossary,
References,
About the Authors,
Index,


CHAPTER 1

The Internet and the Visible Web


To understand the Web in the broadest and deepest sense, to fully partake of the vision that I and my colleagues share, one must understand how the Web came to be.

— Tim Berners-Lee, Weaving the Web


Most people tend to use the words "Internet" and "Web" interchangeably, but they're not synonyms. The Internet is a networking protocol (set of rules) that allows computers of all types to connect to and communicate with other computers on the Internet. The Internet's origins trace back to a project sponsored by the U.S. Defense Advanced Research Agency (DARPA) in 1969 as a means for researchers and defense contractors to share information (Kahn, 2000).

The World Wide Web (Web), on the other hand, is a software protocol that runs on top of the Internet, allowing users to easily access files stored on Internet computers. The Web was created in 1990 by Tim Berners-Lee, a computer programmer working for the European Organization for Nuclear Research (CERN). Prior to the Web, accessing files on the Internet was a challenging task, requiring specialized knowledge and skills. The Web made it easy to retrieve a wide variety of files, including text, images, audio, and video by the simple mechanism of clicking a hypertext link.

The primary focus of this book is on the Web — and more specifically, the parts of the Web that search engines can't see. To fully understand the phenomenon called the Invisible Web, it's important to first understand the fundamental differences between the Internet and the Web.

In this chapter, we'll trace the development of some of the early Internet search tools, and show how their limitations ultimately spurred the popular acceptance of the Web. This historical background, while fascinating in its own right, lays the foundation for understanding why the Invisible Web could arise in the first place.


How the Internet Came to Be

Up until the mid-1960s, most computers were stand-alone machines that did not connect to or communicate with other computers. In 1962 J.C.R. Licklider, a professor at MIT, wrote a paper envisioning a globally connected "Galactic Network" of computers (Leiner, 2000). The idea was far-out at the time, but it caught the attention of Larry Roberts, a project manager at the U.S. Defense Department's Advanced Research Projects Agency (ARPA). In 1966 Roberts submitted a proposal to ARPA that would allow the agency's numerous and disparate computers to be connected in a network similar to Licklider's Galactic Network.

Roberts' proposal was accepted, and work began on the "ARPANET," which would in time become what we know as today's Internet. The first "node" on the ARPANET was installed at UCLA in 1969 and gradually, throughout the 1970s, universities and defense contractors working on ARPA projects began to connect to the ARPANET.

In 1973 the U.S. Defense Advanced Research Projects Agency (DARPA) initiated another research program to allow networked computers to communicate transparently across multiple linked networks. Whereas the ARPANET was just one network, the new project was designed to be a "network of networks." According to Vint Cerf, widely regarded as one of the "fathers" of the Internet, "This was called the Internetting project and the system of networks which emerged from the research was known as the 'Internet'" (Cerf, 2000).

It wasn't until the mid 1980s, with the simultaneous explosion in use of personal computers, and the widespread adoption of a universal standard of Internet communication called Transmission Control Protocol/Internet Protocol (TCP/IP), that the Internet became widely available to anyone desiring to connect to it. Other government agencies fostered the growth of the Internet by contributing communications "backbones" that were specifically designed to carry Internet traffic. By the late 1980s, the Internet had grown from its initial network of a few computers to a robust communications network supported by governments and commercial enterprises around the world.

Despite this increased accessibility, the Internet was still primarily a tool for academics and government contractors well into the early 1990s. As more and more computers connected to the Internet, users began to demand tools that would allow them to search for and locate text and other files on computers anywhere on the Net.


Early Net Search Tools

Although sophisticated search and information retrieval techniques date back to the late 1950s and early '60s, these techniques were used primarily in closed or proprietary systems. Early Internet search and retrieval tools lacked even the most basic capabilities, primarily because it was thought that traditional information retrieval techniques would not work well on an open, unstructured information universe like the Internet.

Accessing a file on the Internet was a two-part process. First, you needed to establish direct connection to the remote computer where the file was located using a terminal emulation program called Telnet. Then you needed to use another program, called a File Transfer Protocol (FTP) client, to fetch the file itself. For many years, to access a file it was necessary to know both the address of the computer and the exact location and name of the file you were looking for — there were no search engines or other file-finding tools like the ones we're familiar with today.

Thus, "search" often meant sending a request for help to an e-mail message list or discussion forum and hoping some kind soul would respond with the details you needed to fetch the file you were looking for. The situation improved somewhat with the introduction of "anonymous" FTP servers, which were centralized file- servers specifically intended...

„Über diesen Titel“ kann sich auf eine andere Ausgabe dieses Titels beziehen.

Weitere beliebte Ausgaben desselben Titels

9780613918954: Invisible Web: Uncovering Information Sources Search Engines Can't See

Vorgestellte Ausgabe

ISBN 10:  0613918959 ISBN 13:  9780613918954
Verlag: Tandem Library, 2001
Hardcover