Distributed Communications through Bittorrent

Written by: FanFan Huang

The Bittorrent protocol in 2002[6] took the internet by storm taking only 6 months to beat out all other peer to peer protocols. Bittorrent success comes from the protocol’s efficiency and its unique policy towards how file sharing was accomplished. Bittorrent came at a time where distributed communication was just emerging into the forefront, its ideas on decentralized communication has spawned many predecessors but as of yet none have managed to topple the original Bittorrent protocol. The standard Bittorrent system consists of a tracker, a client, and a least one seeder.

Traditional Client – Server Systems

Normal web traffic operates on a client server model. Each client is exclusively a client, and each server only serves. When a file is requested from a server by a client the server serves up the file and sends it to the client. The servers in this model are the producers and the clients are the consumers. The ability of each client to download fast depends solely on the server’s ability to serve each client. This means that for servers that aren’t that popular the files will transfer blazing fast, but as more clients wish to obtain a popular file the server eventually reaches the limits of its bandwidth and slows down.

P2P Before Bittorrent

Before bittorrent there was various other P2P contenders they all worked by running a client for example Napster[1] was one of the earliest clients to enter the P2P market. Napster worked on the idea that each client could share or serve a file if they had it on their system. Napster was still at its heart a client server application; the difference is that each client was also a server. Napster’s biggest flaw was that you could only download a file from one server at a time. After Napster, Gnutella[2] was one of the next wave of P2P the key difference with Gnutella vs. Napster was the idea behind Gnutella is that partial file downloads were allowed[2]. That is, another client can get the file from your partially downloaded section before you’ve fully downloaded the file[4]. One of the biggest criticisms with Gnutella was that it was slow the problem comes from all the overhead created with file search queries. Unfortunately when a search is done Gnutella has to forward your search to all subsequent nodes as show in fig: 1 this can be several levels deep and can cause a lot of useless bandwidth consumption[3].


Gnutella architecture diagram[4]

Also due the degrees of separation, Gnutella contains a side effect is that although you might be downloading a really popular file, you might not see all the hosts that are capable of sending you the same file. To do so would use up a lot of overhead making the trade off not worth the benefits[4].

How Bittorrent Works

The heart of the Bittorrent system consists of two parts: the tracker, and the peer. The tracker’s job in Bittorrent is to keep track of all the IP’s that are currently trying to download the same file. Each peer will periodically get updates from the tracker on who’s currently in the swarm. The peers themselves perform all other operations such as uploading pieces as well as keeping track of which client has which piece. According to Bram Cohen the creator of Bittorrent this was done to eliminate the burden on the tracker. If this task was done centrally it would not scale well and it would have made the protocol extremely inefficient. To use Bittorrent you visit a web site that has a “.torrent” file which contains the hash for the file as well as trackers that will be tracking peers for that file. With this information the client or peer does the remainder of the work of finding who has the pieces and downloads them.

Bittorrent works on three primary philosophies: large files are better served in small pieces, the more you upload the more you’re rewarded, and new joined clients will piggyback on older clients. To understand how Bittorrent works imagine you are trying to download a large file and your 50% completed. Now say a new client joins the swarm, in a traditional client server model it will download from the same server you are which effectively cuts your throughput. With Bittorrent the new client will start its download from you until it gets to 50% or until it catches up. This means that the original server gets a chance to send you 100% of the file at which point when new clients join there will now be 2 servers or seeders to choose from.

As mentioned before Bittorrent rewards users for uploading, this is done through the tit for tat system[5,6]. Essentially clients prefer to send data to other clients which have been good uploaders. This helps surmount one of the big issues with Gnutella where many clients will not share their fair amount while downloading, and in doing so this greatly improves Bittorrent's performance overall[5]. This also creates a bit of overhead, a Bittorrent download will generally start really slow as you don’t have any data to upload and as you become more trusted (due to uploading) the download will speed up[6].

Newer versions of the Bittorrent protocol utilize distributed hash tables[6] and peer to peer information exchange in attempts to remove the dependence on the centralized tracker. But the tracker is still required in these systems to at least serve the torrent file. Bittorrent derives much of its efficiency on being able to locate peers quickly right when it starts, a task that would be nearly impossible without a centralized tracker. Unfortunately due to the recent popularity of Bittorrent many ISP's are know to throttle Bittorrent traffic although most will not admit it. The circumvent this Bittorrent recently offered the ability to encrypt all traffic, although at the time of this writing it's debatable if this alone will solve these issues.

Why Bittorrent Became So Popular

There are many reason’s Bittorrent took off to become the dominant P2P client. For one Bittorrent came into the market when the RIAA was heavily suing Napster and Kazaa[6]. Bittorrent was incredibly good at sending large files and movies which at the time due to broadband just started being traded on the internet. Thus Bittorrent’s popularity was largely due to its timing. However timing ment nothing if Bittorrent was no better than the existing solutions, Bittorrent proved very earily on that it was very efficient at transferring movies, and large software files. Today Bittorrent is the transfer method of choice for movies, TV shows, ISO’s, games, a virtually every file that’s over a few megabytes.

Bittorrent Clients

See Also

Refrences

1. Napster – Wikipedia.
Retrieved March 26, 2007 From: http://en.wikipedia.org/wiki/Napster.

2. Gnutella Protocol Specification. Rev 1.2.
Retrieved March 26, 2007 From: http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf

3. Gnutella – Wikipedia.
Retrived March 26, 2007 From: http://en.wikipedia.org/wiki/Gnutella

4. Brian Marshall (2002). How Gnutella Works.
Retrieved March 26, 2007 From: http://computer.howstuffworks.com/file-sharing2.htm

5. Carmen Carmack (2002). How Bittorrent Works.
Retrieved March 26, 2007 From: http://computer.howstuffworks.com/bittorrent.htm

6. Bittorrent – Wikipedia.
Retrived March 27, 2007 From: http://en.wikipedia.org/wiki/Bittorent

7. Filesharing – Kazaa – History.
Retrived March 27, 2007 From: http://wiki.media-culture.org.au/index.php/Filesharing_-_Kazaa_-_History