
How The Internet Works
Introduction
How well does copyright enforcement translate to the digital environment? This section examines the practical realities of this issue by looking at the physical and logical dimensions of the Internet. Too often, copyright discussions treat the technology as an abstract—with no consideration given to the feasibility of creating the desired copyright controls. The "negotiations" held between the content and Internet access providers would, no doubt, have been more productive had they been grounded in a better understanding of the underlying technical issues involved. It is important to recognize that the actual functionalities of the Internet, just like the laws of gravity, cannot be legislated.
Significant limitations do exist. Content is converted into data; the data is broken into unrecognizable digital packets; and the packets are subsequently disassembled and reassembled. Transmission of a single file involves multiple organizations, multiple devices facilitating transmission, and multiple jurisdictions. The nature of the technology itself makes content control, monitoring or protection at points along the network problematic, if not impossible. By the same token, encryption or encoded technology, appropriately applied before and after transmission, can effectively protect content and preserve copyright.
To understand why technology is both an obstacle and an answer to digital copyright protection, it’s necessary to take a closer look at the inner workings of the Internet.
The Internet and How it Works
The Internet is a global "network of networks." In physical terms, the Internet is a vast international collection of networks, computers and software, all working together to form the world’s first digital information infrastructure. Originally envisioned to interconnect no more than 256 networks, today the Internet links tens of thousands worldwide.
The Internet is interconnected in a series of local nodes and regional hubs. Users gain access through commercial online services such as America Online, Prodigy or CompuServe; through Internet service and access providers (ISPs and IAPs) like MSN, Netscape, MCI or AT&T WorldNetSM Services; or through a private network gateway, such as that which might be maintained by a corporation, university or government agency.
No particular company, group or national government owns the Internet. In a sense, the Internet is a phenomenon which has no beginning and no end, but rather a series of networks constantly added and removed as participants come and go. Should failure occur in part(s) of the system, the rest of the Internet continues to function. No single organization holds responsibility for its proper operation. Indeed, no one owns the Internet; rather, it is a "shared resource." The extent to which the Internet continues to serve as the world’s information infrastructure will be a direct result of its voluntary, democratic and unregulated nature.
Common standards have provided the compatibility required by the otherwise disparate technical components of the Internet. For instance, a standard networking protocol called TCP/IP provides interconnectivity among the thousands of smaller networks. Similarly, compliance with a standard application programming interface like sockets allows programs to share text and data across the network. A common language, Hypertext Markup Language (HTML), provides a standard text file format.
The Internet is divided into a series of levels. The World-Wide Web is the layer of the Internet which provides multimedia content, organized in a series of graphical home pages. WWW content is located and accessed with tools called web browsers. In the early 1990’s, developers at the National Center for Supercomputing Applications (NCSA) created the first well-known tool for browsing content on the Internet: the NCSA Mosaic program. Although browsers today might contain many user interface distinctions and other features for market differentiation, all capitalize on the basic principles demonstrated by NCSA. Operating and looking the same on UNIX, PC, and Macintosh computers, NCSA Mosaic fundamentally changed the way that people used Internet tools, making the experience relatively uniform.
As noted, browsers have added new features and benefits, and will continue to do so, enabling the user to experience the benefits of sound, video, 3-D graphics and more. In a sense, today’s browser is somewhat analogous to a Dolbyä sound system, which takes a signal which might otherwise be unusable or unpleasant and, through special packaging or an enhancement, adds substantial value to the original, sometimes simply making it perceptible to the end-user.
Browsers display the information located at web sites. Browsers work in conjunction with various directory publishers and their search engines. These engines index web site content using the standard labels attached to Web documents, termed Universal Resource Locators (URLs). Retrieval is accomplished in different ways and with varying degrees of sophistication and precision.
The momentum of the WWW has tended to overshadow other aspects of the Internet. The Internet offers a variety of text-based services: e-mail, newsgroups, and file downloads. Just as browsers are used to navigate on the WWW, a variety of tools allow users to mine this text-based substrata of the Internet, which include e-mail, newsgroup readers, FTP clients, Gopher clients, wide area information server (WAIS) clients and more.
Briefly, these tools perform the following tasks:
E-mail tools allow messages to be created, managed and exchanged over the Internet
Gopher clients navigate the menu-driven structure of Gopher servers to locate precise information
Newsgroup readers deliver newsgroup content with features which facilitate screening and prioritization
WAIS tools search database content for key search terms—as opposed to just scanning heading labels
Other text-based tools include Archie, Veronica and Jughead. Archie is used for file transfer from FTP sites; Veronica provides exhaustive Gopher site searching; Jughead offers specified Gopher site searching.
Even these distinctions between tools are rapidly becoming arbitrary, called into question by the ability of many Web browsers to either provide these services or to be packaged and integrated with tools which do.
The Internet, unlike the text-based networks of the past, features a wide variety of data types. While much of its content remains text, many content providers on the WWW are now using rich data in the form of still images and audio. Moreover, compression routines, advances in CPU power, new standards and other innovations are making the introduction of video, three-dimensional data, real-time audio and animation commonplace.
The Building Blocks: Hardware, Software and Data
The Connection
Each computer must be connected to the Internet through some sort of service. Users generally connect to the Internet through an access or service provider, sometimes by way of a dedicated leased circuit, but usually via dialed connections established when needed. Typical consumer usage involves dialed connections, and the description which follows focuses on dial-up users (except where noted).
At a technical level, IAP facilities consist of modem, terminal servers, and other equipment enabling users to establish a session; Internet Protocol (IP) routers, leased circuits, and other connectivity infrastructure; various servers operated directly by the IAP for email, news and other content; and, routers and leased circuits, providing connectivity to the Internet at large.
Modems convert digital signals into analog signals so that they can pass over normal telephone circuits. At the other end, the opposite process takes place: the analog signal is converted back to a digital data stream. An IAP will typically have many modems available and configured so that users dialing a single telephone number will be randomly connected to any one of the modems. Once the user is connected to a particular modem and the session is established, all data passing between the user and the IAP in either direction will pass through that modem. A modem concerns itself only with the correctness of data bits and makes no interpretation of the content of the data being passed.
The analog side of a modem is connected to a telephone line. The digital side of a modem is usually connected to a serial port on a terminal server. A typical terminal server will have up to a few dozen modems connected to it. The terminal server converts the data received from the modem into the format required for the rest of the IAP infrastructure. Only the network-specific portion of the data is changed; no access to the content of the data is performed. The terminal server controls the physical activity of the modem, acting as an "onramp" or Internet access point, for the IAP, collecting and submitting authentication information (ID and password) to a server elsewhere in the IAP complex.
When data leaves the terminal server via the local network ("LAN") connection, data from the active modems are interleaved as needed in the data stream and sent to the LAN. Usually, the LAN itself is fairly small and is used merely as a high-speed conduit to an IP router. The IP router shuttles packets from one of its input ports to one of its output ports. The ports can be the entry way to LANs, high speed dedicated circuits, or other networks facilities. Routers are "stateless," which means they route each block of data individually and do not remember how they routed previous blocks. Routing decisions are made based on header information (never on content data) and controlled by a combination of configuration information and dynamic routing protocol information.
Packet Switching and Caching
The routing decisions are facilitated by packet switching. Packet switching is a technique used in data networking to lower the cost of dedicated circuits and to avoid congestion at various network nodes. This technique involves breaking data into packets, each with its own individual address. An apt analogy is to think of a group of co-workers traveling to a meeting. All leave a common address – the office – for a common destination – the meeting. Each worker, however, can take any number of routes throughout the city to reach their destination. At any one point in time during travel, it would be nearly impossible for the boss to know the location of all the travelers.
The packet also contains the information necessary to properly sequence data once it arrives at the address site. The packets can be analogized to traditional mail. The packet headers act as envelops, providing all relevant addressing information. At the same time, the payload (content) remains shielded from the details of delivery. IAPs will usually offer two or more physically diverse paths between any two points to promote reliability. In the event of the failure of a single telecommunications link, the other links can continue to carry the packets. A router can tell when a link has failed or is degraded. In such a case, or when otherwise configured to do so, a router will favor one link over another when routing packets. In general, however, physically diverse paths that connect to the same distant point will be considered equivalent. Packets have an equal chance of being routed over any of the equivalent links. In fact, two consecutive packets from the same source to the same destination will often be routed over different links. Since there are many routers and many links between typical sources and destinations, the packets from a data stream can individually travel widely different paths.
Internet providers use a variety of techniques to improve the speed and reliability of data delivery to the end user. Packet switching, as described above, is such a technique, having the possibility to improved both the speed and reliability of data delivery. Caching is another tool employed by Internet providers to improve network performance. Caching involves storing often-accessed data on special high-speed servers or on local machines for quick access. By reducing the travel distance on the Internet (and thus, network traffic) on the Internet, caching greatly improves access speed and the speed of data delivery to the end user.
Additional Services
Most IAPs offer services beyond mere IP routing. These services might involve electronic mail servers, usenet news servers, "chat" conference servers, DNS (domain name system) servers, web servers, file storage areas on FTP (file transfer protocol) servers, or HTTP (hypertext transfer protocol) proxy servers. These services will be operated on one or more general purpose computers within the IAP's complex.
At some point, links from routers connect not to other parts of an IAP’s infrastructure but to routers belonging to other IAPs or third parties. There is little technical difference between the movement of packets within the facilities operated by the IAP and the movement of packets into, across, and out of the Internet.
Putting the Pieces Together: The Practical Realities of Copyright Protection
Due to the architecture and technologies of the Internet, only two truly effective points of control and enforcement exist: the point of transmission and the point of reception. Packet switching, caching, and routing make any other control nearly impossible.
Cases involving inadvertent copyright infringement could be handled through a formal notification process by the copyright holder or the content provider to the IAP, after which the site operators would, if technically feasible and economically reasonable, remove the infringing material from the site.
But suppose there is a place on the Internet, "scofflaw.example.com", which habitually hosts material in violation of international copyright agreements. Is it technically possible and economically feasible for an IAP to block access to the site, preventing customers from obtaining material from that site via communications links that the IAP controls?
The answer is that this cannot be done in any practical, economically feasible way, for sites not hosted in the IAP’s servers. Before getting into the details of why it's not possible, it's worthwhile to explore some underlying principles.
There are many competing products on the market today which will allow users to voluntarily limit the content that they wish to be able to access. Generally, these are software "filters" which are marketed under the category of parental control. The intent is that parents and community leaders can use these products to prevent children from viewing content that they consider to be violent, pornographic, or otherwise unacceptable. Accompanying the software filters are databases of site content ratings prepared by some third party. It would not be a difficult matter to add copyright-infringing sites to the database. At best this kind of technology only covers voluntary participation by responsible IAP customers who will, figuratively and literally, ignore the infringing site. It is important to note, however, that screening technologies are only effective after the infringing material has been made available.
Even if attempts at access could be effectively screened, site operators would make the material available via more circuitous access methods. IAP customers, knowingly or otherwise, will use the more circuitous access methods. A site name is more the name of a logical place than a physical place. A given site name may actually span several machines; on the other hand a single machine may actually host several different logical site names. The consequence is that blocking access to a specific site is actually an imprecise undertaking. One may actually be blocking access to several sites, only some of which are infringing copyright.
The contest between site operators trying to provide access to infringing content and the IAP trying to prevent access can be characterized as an escalating strategy and counter-strategy situation. The IAP implements techniques for preventing one method of access, and the infringing site’s operator will invent new access techniques. Meanwhile, the effort on the part of the infringer is relatively small in the overall scheme of Internet access. Correspondingly, the job of the IAP in separating attempted access to legitimate materials from access to infringing materials is extremely complex and resource intensive.