1. How to Run Applications Faster ?
Research Issues in P2P • There are 3 ways to improve performance:
– Work Harder
Computing – Work Smarter
– Get Help
• Computer Analogy
– faster hardware high performance processors or
peripheral devices
– Optimized algorithms and techniques used to
solve computational tasks
– Multiple computers to solve a particular task
Distributed….
OUTLINE
• When a handful of powerful computers are
• Centralized Vs. Distributed linked together and communicate with each
• What is P2P? other
• P2P Architectures – the overall computing power available can be
• P2P and Applications amazingly vast.
• Search and Replication Techniques – Such a system can have a higher performance share
• P2P Security than a single supercomputer.
• Emerging P2P Applications – The objective of such systems is to minimize
• Conclusion communication and computation cost.
Centralized?
• Computation in networks of processing • Distributed system is an application that executes a
nodes can be classified into centralized or
distributed computations. collection of protocols to coordinate the actions of
• A centralized solution relies on one node multiple processes on a communication network,
being designated as the computer node
that processes the entire application such that all components cooperate together to
locally perform a single or small set of related tasks.
• The central system is shared by all the
users all the time.
• There is single point of control and
single point of failure.
1
2. Examples of Distributed Systems
• The Internet
• The collaborating computers can access remote – Heterogeneous
resources as well as local resources in the network of computers
distributed system via the communication network. and applications
• The existence of multiple autonomous computers is – Implemented
through the Internet
transparent to the user in a distributed system.
Protocol Stack
– The user is not aware that the jobs are executed by
multiple computers subsist in remote locations.
– A centralized algorithm is at the heart of a single
computer.
– A distributed algorithm is at the heart of a society of
computers
Computer Networks vs. Distributed Systems Distributed….
• Distributed systems are built up on top of existing networking and
operating systems software.
• Computer Network: the autonomous computers are
• The Middleware enables computers to coordinate their activities
explicitly visible and to share the resources of the system
– Middleware is the bridge that connects distributed applications across
• Distributed System: existence of multiple dissimilar physical locations, with dissimilar hardware platforms, network
autonomous computers is transparent technologies, operating systems, and programming languages.
• Middleware provides standard services such as naming, concurrency
• Many problems in common control, event distribution, authorization to specify access rights to
resources, security etc.
• Normally, every distributed system relies on
services provided by a computer network.
2
3. Computing Platforms Evolution: Breaking
Administrative Barriers Foster-Kesselman
• The Foster-Kesselman duo organized in Ian Foster
1997, at Argonne National Laboratory, Mathematics and Computer
a workshop entitled “Building a Science Division
Computational Grid”.
2100 2100 2100
2100
Argonne National Laboratory
Argonne, IL 60439
P ? • At this moment the term “Grid” was
E born.
R
• The workshop was followed in 1998 by
2100 2100 2100 2100
F 2100
Administrative Barriers
O
R the publication of the book “The Grid:
M
Individual
Group
Blueprint for a New Computing
A
N Department Infrastructure” by Foster and
C Campus Kesselman themselves. Carl Kesselman
E State Information Sciences Institute
National • For these reasons they are not only to University of Southern
Globe
Inter Planet
be considered the fathers of the Grid California
Universe but their book, which in the meantime Marina del Rey, CA 90292
was almost entirely rewritten and re-
published in 2003, is also considered the
Desktop SMPs or Local Enterprise Global Inter Planet
“Grid bible”.
(Single Processor) SuperCom Cluster Cluster/Grid
puters Cluster/Grid Cluster/Grid ??
The Need for Collaboration? Electric Grid and Grid Computing
• Computing grids are conceptually not unlike
• The worldwide business demands intense electrical grids.
problem-solving capabilities for incredibly • Electric power grid - a variety of resources
contribute power into a shared "pool" for many
complex problems consumers to access on an as-needed basis.
– the need for dynamic collaboration of many – In an electrical grid, wall outlets allows us to
link to an infrastructure of resources that
computing resources to be able to work together. generate, distribute, and bill for electricity.
• This is a difficult challenge across all the technical – When you connect to the electrical grid, you
don‟t need to know where the power plant is
communities to achieve this level of resource or how the current gets to you.
collaboration within the bounds of the necessary • Grid computing uses middleware to coordinate
disparate IT resources across a network,
quality requirements of the end user. allowing them to function as a virtual whole.
– The goal of a computing grid, like that of
the electrical grid, is to provide users with access
to the resources they need, when they need them.
Why Grids ?
Large Scale Exploration needs them
Solving technology problems using computer
modeling, simulation and analysis
Geographic
Information
Systems
Life Sciences Aerospace
CAD/CAM
Military Applications
3
4. CERN’s Large Hadron Collider Client-Server Model
1800 Physicists, 150 Institutes, 32 Countries The most widely used
Client invocatio n Server
invocatio n
result result
Server
Client
100 PB of data by 2010; 50,000 CPUs? Key:
Process: Computer:
Source
The Large Hadron Collider (LHC) Router
A gigantic scientific instrument near Geneva
It is a particle accelerator used by physicists to study the smallest known “Interested”
particles – the fundamental building blocks of all things. End-host
Client-Server
Why P2P?
Source
Router
“Interested”
End-host
4
5. Client-Server Why P2P?
Overloaded!
Personal Computers
80% idle CPU time
Internet
Laptop
90% idle CPU time
Source
Computers in our Lab
Router 99% idle CPU time
! Hot Spots become hotter
“Interested”
End-host
What is driving P2P?
Problem with Client-Server Model
• Clients are not so dumb.
– Scalability • Billions of Mhz CPU, tons of terabytes
• As the number of users increases, there is a higher disk, millions of gigabits network
demand for computing power, storage space, and
bandwidth, …
bandwidth associated with the server-side
– Reliability – Unused resources.
• The whole network will depend on the highly loaded
server to function properly
Computer System Taxonomy P2P – An overlay network
• P2P overlay network
C
Computer Systems – The connected nodes E
construct a virtual overlay
Centralized Systems network on top of the F
Distributed Systems
(mainframes, SMPs) underlying network
infrastructure B
Client - server C
Peer- to- Peer – Peer-to-peer network E
topology is a virtual overlay A
at application layer F
G
B
D
30
5
6. Typical Characteristics
• Large Scale: lots of nodes (up to millions)
Internet
Client
Client Cache • Dynamicity: frequent joins, leaves, failures
Client Proxy Client
• Little or no infrastructure
Client
server server Client Peer-to-peer model – No central server
Congestion zone
• Symmetry: all nodes are “peers” – have same role
Client Client/ Client/ Client/
Client Server Server
Client Server
Client
Client/
Server Client/
Client/server model server server Server
Client/ Congestion zone Client/
Server
Server
Client/
Server Client/
Server
What is it... P2P Dominates Internet Traffic
• P2P computing is the sharing of computer resources and
services by direct exchange between systems.
• These resources and services include the exchange of • P2P has dominated Internet traffic
information, processing cycles, cache storage, and disk storage In 2006, more than 60% of Internet traffic
for files.
• P2P computing takes advantage of existing desktop computing
power and networking connectivity,
– allowing economical clients to leverage their collective
power to benefit the entire enterprise.
• In a P2P architecture, computers that have traditionally been
used solely as clients communicate directly among themselves
and can act as both clients and servers, assuming whatever role
is most efficient for the network.
• Each node (peer) called servent acts as both a SERVer and a
cliENT
Shared folder, neighbors
Client and server
Some Statistics about P2P Systems
Peer
Peer • More than 200 million users registered with skype,
Peer around 10 million on-line users. (2007)
Search
Peer Peer • Around 4.7M hosts participate SETI@Home (2006)
Peer • BT accounts for 1/3 of Internet traffic (2007)
• More than 200,000 simultaneous online users on PPLive
Retrieve (streaming video network). (2007)
File Peer
Peer • More than 3,000,000 users downloaded PPStream. (2008)
Peer
Peer Peer
36
6
7. P2P Applications
• In Peer-to-Peer (P2P) computing, applications are
segregated into three main categories:
– distributed computing,
– file sharing, and
– collaborative applications
• The three categories of P2P serve different purposes
– Distributed computing applications typically require the
decomposition of larger problem into smaller parallel problems
– File sharing applications require efficient search across wide
area networks and
– Collaborative applications require update mechanisms to
provide consistency in multi-user environment
P2P Network Architectures P2P Computing
• Centralized (Napster)
• File sharing (e.g.,
• Decentralized Gnutella, Freenet,
Communication and collaboration
Groove
Skype
– Unstructured (Gnutella) Limewire, KaZaA)
– Structured (Chord) • Collaboration (e.g.,
Magi, Groove, Jabber) Napster
• Hierarchical (MBone) • Distributed computing
Gnutella
Kazaa
Freenet File sharing
• Hybrid (EDonkey) (e.g., SETI@home, Overnet
Search for SETI@Home
Extraterrestrial folding@Home
Intelligence) Distributed computing
Computer Systems
Centralized Systems Distributed Systems
(mainframes, SMPs, workstations)
Client - server
Peer-to-Peer
P2P FILE SHARING
APPLICATIONS
Centralized Decentralized
Structured Unstructured
7
8. P2P Applications
Napster: Example
• File sharing (music, movies, …)
– utilise the idle disk space for storage and the existing m5
network bandwidth for search and download. m6 E
– The cost of operation is very low F
m1 A D
• majority of peers collect only objects that they are E?
E m2 B m4
interested in anyway. m3
m4
C
D
E?
m5 E
– Eg: Napster, KaZaA and Gnutella m5 m6 F
C
A
m3
m1
m2
File Sharing Services Unstructured P2P
• Publish – insert a new file into the network Flooded to connected peers Flooded between supernodes
• Lookup – given a file name X, find the host
that stores the file
• Retrieval – get a copy of the file search
transfer
supernode
• Join – join the network 2.query
• Leave – leave the network
– Neighbors 1.query peer node
Centralized P2P File Sharing: Gnutella
• Utilize a central directory for object
location • Gnutella is a file sharing protocol
• For file-sharing P2P, location inquiry Centralized Server
form central servers then downloaded
directly from peers
• Gnutella was originally designed by Nullsoft, a
• Benefits
– Simplicity
subsidiary of America Online.
•
– Limited bandwidth usage
Drawbacks 1. query
• Its architecture is completely decentralised and
– Unreliable (single point of failure),
performance bottleneck, and
upload indexes distributed
scalability limits
– Vulnerable to DoS attacks
2. response
• When a client wishes to connect to the network
– Copyright infringement
they run through a list of nodes that are most likely
to be up or take a list from a website and then
3. transfer connect to how ever many nodes they want
8
9. Gnutella Search Mechanism Peer-to-Peer File Sharing is all about the trading of
copyrighted music and videos without paying anything to the
authors
Assume: m1’s neighbors are m2 and m3; m3’s neighbors
are m4 and m5;… A,B,C,D,E,F are resources
TTL
m5
query
E music
m6
category
F D
E E?
E? m4
KaZaA
E? Native
Windows
E?
Application
C
A banner
B m3
m1 ad
m2
3 million users online
sharing 4 PetaBytes of data
• Advantages
– Fast lookup
– Low join and leave overhead
– Popular files are replicated many times, so lookup with small TTL
will usually find the file
• Can choose to retrieve from a number of sources Searching
• Disadvantages
– Not 100% success rate, since TTL is limited
– Very high communication overhead
– Uneven load distribution
Kazaa Search in Unstructured P2P
Two general types of search in unstructured p2p:
Blind: try to propagate the query to a sufficient
number of nodes (example Gnutella)
Informed: utilize information about document
locations
Sharman Networks
Kazaa is a file sharing program that allow you to download
audio,video, images, documents and software files.
9
10. Blind Search Methods
APS – an example
BFS and Random Walk
Node J holds the requested object
Nodes deploy 2 walkers, initially
All index values are 20
TTL=3
• BFS Random walks
•In unstructured networks, flooding would exhaust bandwidth of network.
Collaborative Community
Informed search • Rapidly changing work environment
– Out-sourcing, in-sourcing, home-sourcing
– Tight integration and team work with customers,
Informed: utilize information about document partners, vendors
locations. • P2P allows management of documents at level
of closed working groups.
APS • The collaboration software is designed to
improve the productivity of individuals with
common goals or interests.
• Groove is a collaborative P2P system
(http://www.groove.net)
– Part of the Microsoft Office system
– Document sharing and collaboration –
• vital for a business.
– Office Groove 2007 is a collaboration software program
• helps teams work together dynamically and effectively, even
if team members work for different organizations, or work
remotely.
Work Together: Anyone, Anytime, Anyplace
Microsoft Office Groove 2007
Adaptive Probabilistic Search
• Each node keeps a local index Example (indices at node A)
consisting of one entry for each
object it has requested per neighbor. A chooses B with Pr=0.3
• Index values represent the A chooses C with Pr =0.5
probability of finding that object A chooses D with Pr=0.2
through that neighbor
• Searching is based on the
simultaneous deployment of k
walkers and probabilistic forwarding.
• if a hit occurs, the walker terminates
successfully.
• On a miss, the query is forwarded to
one of the node‟s neighbors.
10
11. Distributed Computing: SETI@home
Search for Extraterrestrial Intelligence -if we are alone
in the universe or whether there is intelligent life
somewhere else in the Universe.
Over two million computers crunching away and
downloading data gathered from the Arecibo radio
telescope in Puerto Rico, USA
The SETI@Home project is widely regarded as the
fastest computer in the world
Sharing of resources such as computation power,
network bandwidth and storage
Achieves computing power cheaper than a
supercomputer can provide.
Developed by the Space Sciences Laboratory, at the
University of California, Berkeley, in the United
States.http://setiathome.ssl.berkeley.edu
Launched in 1996
How SETI@home works?
Collect data source
Use telescope to collect data source from outer space at
Arecibo.
The SETI@home use data recorder to record data source on
removable tape.
Distribution of data source
SETI@home divide data into fixed-size work units.
SETI@home distribute these data via Internet from the
servers to a client program.
Client program computes result ,then returns it to the server,
and gets another work unit.
How SETI@home works? …
• Scientific experiment - uses Internet-connected computers
• Distributes a screen saver–based application to users
• Applies signal analysis algorithms different data sets to process radio-telescope data.
• Has more than 3 million users
3. SETI client gets
data from server and runs
Main Server
4. Client sends results
back to server
Radio-telescope
Data
2. SETI client (screen
Saver) starts
11
12. Super nodes
• “… a free program that uses the latest P2P…technology to • Super nodes are Skip clients run by users that have a
bring affordable and high quality voice communications to people “good” Internet connection and a “good” computer.
all over the world…” • Having a good Internet connection means having a public
• Skype offers voice, video, chat and data transfer IP address, without firewall restrictions.
services over IP • A good computer is a machine that can forward other
• The first stable version of Skype has been released in July users‟ communications and handle many connections.
2004, since then the number of users kept on growing. • SN have a role of relay in the network
• Nowadays Skype claims having more than 20 millions – Hence, they need a better connectivity and better performances.
accounts and between 4 and 6 millions of users • 1 SN are used to connect SC together.
simultaneously connected.
Skype
Skype Software features
Skype – login
• VoIP from computer to computer
– The most used feature especially. • Skype clients directly connect to login
• VoIP from computer to regular phone (Skype Out)
– By registering on Skype‟s website it is possible buy credit and then call all over
servers, whose IP addresses are hard
the world with very interesting rates compared to rates applied by phone
companies.
coded within the software.
• Video conferencing Introduced in Skype2.0 in 2006. – In this connection the login name and
• Instant Messaging This feature is comparable to many other the version are sent in clear text format.
instant messaging clients like MSN Messenger, Yahoo! Messenger,
Google Talk, etc.
– The main difference is that Skype does not tell the user whether the person he
• The login server stores all of user
is chatting with is typing or not. This is due to the P2P design of the Skype
network.
names and passwords and ensures
• File Transfer that names are unique across the
– The Skype network design has a big influence on the quality of file transfers.
It can make it very fast (1Mbps) or very slow (3 kbps). Skype name space
Internet Telephony - Skype
• The participants form a self-organizing • Connection to a bootstrap node
P2P overlay network to locate and – When SC (Skype Client) is installed the first time it
communicate with other participants.
come with a list of SN to connect to.
• The bandwidth is shared and the sound
or video in real time is shared as resource – First, the Skype Client tries to connect to 5 SN sending
• Skype has a similar architecture as its a UDP packet to IP addresses of super nodes
predecessor KaZaA randomly chosen in the host cache.
• There are three types of nodes in the – When the client finds a super node to connect to, it
Skype network:
refreshes its list of active and available super nodes in
– Ordinary-peers
host cache.
– Super-nodes
– Central login server – SC connects to a SN
• Communications are encrypted (RSA)
12
13. Traffic volume content type (Germany, BitTorent)
Skype - user search
• Similar to KaZa (searching for callee)
• Client sends an user name to SN and as an answer
receives few IP addresses and port numbers
• Subsequently the client contacts these nodes
• If it cannot find the user it sends request to its SN
once again and as a result receives another few IP
addresses and port numbers
• The process continues until the user is found
What is PPLive?
Skype - call establishment What is PPLive?
– An online video broadcasting and advertising
• Routing in the Skype overlay network is done by network
• Provides an online viewing experience
the SN. comparable to that of traditional TV
broadcasting
• 75 million global installed base and 20
• When a SC tries to establish a call, it first ask its million monthly active users
• 600+ channels on PPLive with content
SN (if it is not a SN itself) where is the callee and ranging from news, music, sports, movies,
tries to connect directly to it. games, live video and other interactive
services to a global audience
– An efficient P2P technique platform and test
– If the SC is restricted because of firewall then it will bench
connect to the callee using a SN as a relay. History of PPLive:
– If both a caller and a callee have public IP addresses, a • Bill’s story
– Inventor of PPLive core technology
caller sends signaling information over TCP to a callee – Dropped out of post-graduate program to start
PPLive
P2P VIDEO STREAMING PPLIVE
• Streaming video is content sent in compressed form over
the Internet and displayed by the viewer in real time.
• With streaming video or streaming media, a Web user does
not have to wait to download a file to play it - the media is
sent in a continuous stream of data and is played as it
arrives.
• The user needs a player, which is a special program that
uncompresses and sends video data to the display and
audio data to speakers.
• A player can be either an integral part of a browser or
downloaded from the software maker's Web site.
• P2P streaming
– P2P TV
• PPLive, PPStream, Joost (by Skype
founders), …
13
14. Streaming Tree Reconstruction after a Peer
Industry Trends Departure
PPLive is well positioned to exploit the next explosive growth
Advanced Video Streaming
PPLive
Applications
VOIP
Skype
Downloading
BitTorrent
File Sharing
Basic Napster
Applications
2001 2003 2004 2005
PPLive Multi-tree Streaming
Media Server (channel management server) - Retrieve list of channels via HTTP
Membership Server -Retrieve small list of members nodes of interest via UDP Since all peers are involved in the data distribution, the load is
spread among all nodes.
Single-tree Streaming A snapshot of a tree-based overlay with 231 nodes
• A common approach to P2P
streaming is to organize
participating peers into a single
tree-structured overlay
– The content is pushed from the
source towards all peers.
– This way organizing peers is called
single-tree streaming.
• In these systems, peers are
hierarchically organized in a tree
structure where the root is the
stream source.
• The content is spread as a
continuous flow of information
from the source down to the
tree.
14
15. Overall Architecture
Web Server Tracker
Bit Torrent
•Created by Brahm Cohen in 2001
C
A
Peer
Peer [Seed]
B
[Leech]
Downloader Peer
“US” [Leech]
What is BitTorrent?
Overall Architecture
• A peer-to-peer file transfer protocol
Tracker
• Extremely popular today Web Server
• “Pull-based” “swarming” approach
• Each file split into smaller pieces
• Nodes request desired pieces from
neighbors
• As opposed to parents pushing data
C
that they receive A
• Pieces not downloaded in sequential Peer
order Peer [Seed]
B
• Encourages contribution by all nodes [Leech]
Downloader Peer
“US” [Leech]
Overall Architecture Overall Architecture
Web Server Tracker Web Server Tracker
C C
A A
Peer Peer
Peer [Seed] Peer [Seed]
B B
[Leech] [Leech]
Downloader Peer Downloader Peer
“US” [Leech] “US” [Leech]
15
16. Overall Architecture BitTorrent Lingo
Web Server Tracker
Seeder = a peer that provides the complete file.
Initial seeder = a peer that provides the initial copy.
Leecher
Initial seeder
One who is downloading
C
A
Peer
Peer [Seed] Leecher
B
[Leech]
Downloader Peer
Seeder
“US” [Leech]
Overall Architecture BitTorrent Basics
Web Server Tracker
• Files are broken into pieces.
– Users each download different pieces from the
original uploader (seed).
– Users exchange the pieces with their peers to obtain
the ones they are missing.
A
C
• This process is organized by a centralized server
Peer called the Tracker.
Peer [Seed]
B
[Leech]
Downloader Peer
“US” [Leech]
Overall Architecture Critical Elements
Web Server Tracker
• A web server
– stores and serves the .torrent file.
– For example:
• http://bt.btchina.net Web Server
• http://bt.ydy.com/
C
A
Peer The Lord of Ring.torrent
Peer [Seed]
B
[Leech]
Troy.torrent
Downloader Peer
“US” [Leech]
16
17. BitTorrent Swarm
Critical Elements
• Swarm
• The .torrent file – Set of peers all downloading the same file
– Static „metainfo‟ file to contain necessary – Organized as a random mesh
information : • Each node knows list of pieces downloaded by neighbors
• URL of tracker • Node requests pieces it does not own from neighbors
• Piece length – Usually 256 KB Matrix.torrent -------------------------------------------------
• SHA-1 hashes of each piece in file • swarm
• IP address of the Tracker – The group of machines that are collectively connected for a
particular file.
• For example, if you start a BitTorrent client and it tells you that you're
connected to 10 peers and 3 seeds, then the swarm consists of you and
those 13 other people.
How a node enters a swarm
Critical Elements
for file “popeye.mp4”
• A BitTorrent tracker
– The tracker maintains information about all BitTorrent • File popeye.mp4.torrent
clients utilizing each torrent. hosted at a (well-known)
– The tracker identifies the network location of each client webserver
either uploading or downloading the P2P file associated with • The .torrent has address of
a torrent.
tracker for file
– It also tracks which fragment(s) of that file each client
possesses, to assist in efficient data sharing between clients. • The tracker, which runs on a
• i.e. the tracker keeps track of all peers downloading file webserver as well, keeps
For example: track of all peers
• http://bt.cnxp.com:8080/announce downloading file
• http://btfans.3322.org:6969/announce
Critical Elements How a node enters a swarm
for file “popeye.mp4”
• An end user (peer) www.bittorrent.com
– Guys who want to use BitTorrent must install • File popeye.mp4.torrent
corresponding software or plug-in for web browsers. hosted at a (well-known)
1
– Downloader (leecher) : Peer has only a part ( or none ) of webserver
the file. Peer • The .torrent has address of
tracker for file
– Seeder: Peer has the complete file, and chooses to stay
• The tracker, which runs on a
in the system to allow other peers to download
webserver as well, keeps
– BitTorrent clients connect to a tracker when attempting track of all peers
to work with torrent files. downloading file
• The tracker notifies the client of the P2P file location (that is
normally on a different, remote server).
17
18. How a node enters a swarm Three elements necessary to sharing a file
for file “popeye.mp4” with BitTorrent
www.bittorrent.com • The tracker - coordinates connections among the peers.
– Tracker doesn't know anything of the actual contents of a file
• File popeye.mp4.torrent – Generally, it's considered good manners to continue seeding a file after you
hosted at a (well-known) have finished downloading, to help out others.
webserver • The web server - stores and serves the .torrent file.
2 • The .torrent has address of • At least one seeder
Peer
– Contains any of the file's actual contents.
tracker for file – The seeder is almost always an end-user's desktop machine (peer), rather
Tracker • The tracker, which runs on a than a dedicated server machine.
webserver as well, keeps – Seeding is monitored by the Tracker
– Seed your file for a long time to prevent peers from being left with
track of all peers incomplete files.
downloading file • When you finish a download in BitTorrent, and you are only
uploading, you're seeding!
How a node enters a swarm
File sharing
for file “popeye.mp4”
www.bittorrent.com
Large files are broken into pieces of size between
• File popeye.mp4.torrent
hosted at a (well-known) 64 KB and 1 MB
webserver
Peer • The .torrent has address of
tracker for file
3 Tracker • The tracker, which runs on a
webserver as well, keeps
track of all peers
downloading file
1 2 3 4 5 6 7 8
Swarm
BT: publishing a file A trivial example
{1,2,3,4,5,6,7,8,9,10}
Harry Potter.torrent
Bob
User
Seeder:
John
Web Server
{}
{1,2,3}
Tracker {1,2,3,5}
{}
{1,2,3}
{1,2,3,4}
{1,2,3,4,5} User
Downloader: Seeder: Downloader: User
Downloader
A B C Downloader Joe
Fan Bin
18
19. Types of P2P Attacks
P2P Technical Challenges
• Poisoning: a client can provide content that doesn‟t
match the description.
• Routing protocols – A client A, can broadcast a message saying it needs file
• Network topologies „X‟. A malicious client can send a message back to A
• Peer discovery saying it has file X, then send it file Y.
• Communication/coordination protocols • Denial of Service attacks that decrease or cease
• Quality of service total capable network activity.
• Security • Defection attacks which allow a client to participate
on the network with a very low upload-to-
download ratio.
Types of P2P Attacks….
P2P SECURITY • Virus attacks, where a malicious client can add
viruses into files shared on the network.
• Malware attacks, where the P2P software
Security is the condition of being protected contains spyware.
against danger or loss. • Filtering attacks, where network operators may
attempt to prevent P2P network data from being
carried out.
P2P Security Attacks On & From
• P2P file sharing networks are constantly under • Attacks on P2P systems:
attack.
• P2P is potentially more vulnerable than client server.
– Decentralized
– More difficult to manage and control • Attacks from P2P Systems:
• Need to understand the security issues for
architecting future P2P apps
111 114
19
20. Attacks on P2P sharing File Pollution
Two types:
Unsuspecting users
Alice
spread pollution !
• Pollution: file corruption File Content
• Index poisoning File Index
115 Bob 118
File Pollution
original content
polluted content
Unsuspecting users
spread pollution !
pollution
company
Yuck
File Pollution
116 119
File Pollution INDEX POISONING
• Aim of the attacker is to make several
peers believe that some popular file is
present with the victim.
• Attacker sends a location publish
pollution message to every crawled peer.
server • In this message, the attacker includes
victim‟s IP address and port number.
pollution • Attacker puts the file hash of a popular
company file along with the message.
file sharing • Peer B adds this file hash into it along
network with the location of the victim.
pollution pollution • When a peer C searches for that file, it
server server may be told by some poisoned peer that
victim has the file.
pollution
server
117
20
21. Index Poisoning Free Riding
• Peers share little or no data in P2P file-sharing
systems
index 23.123.78.6
title location • Measurement
bigparty 123.12.7.98
smallfun 23.123.78.6
– Nearly 70% of Gnutella users share no files
123.12.7.98 heyhey 234.8.89.20 – Nearly 50% of all responses are returned by the
top 1% of sharing hosts
file sharing • Incentive mechanisms to encourage user
network
cooperation
234.8.89.20
121
Index Poisoning
P2P Worms
index 23.123.78.6
title location
bigparty 123.12.7.98 Topological Passive
123.12.7.98
smallfun 23.123.78.6
heyhey 234.8.89.20
Scan Worms Worms
bighit 111.22.22.22
A computer worm is a self-replicating malware computer program.
234.8.89.20 It uses a computer network to send copies of itself to other nodes
111.22.22.22 It may do so without any user intervention.
122
ROUTING TABLE POISONING TOPOLOGICAL WORM ATTACK
• The aim of the attacker is to
make the peers add victim as
their neighbors
• Attacker sends node
announcement messages to
every crawled peer.
• Attacker includes victim‟s IP
address and port number in
these messages
• The peers add victim as their
neighbor
• Query messages are forwarded
to the victim
21
22. TOPOLOGICAL WORM ATTACK Effects
• Eating up free disk space
• Benjamin opens a Web page, called
benjamin.xww.de to display banner ads.
– One day morning the Benjamin.xww.de Web site
had a message saying: "Domain closed due to
massive abuse."
PASSIVE P2P WORMS
• Vulnerability in the protocol
• Wait for the vulnerable targets to contact them
• Case 1
– Worm can create infected copies of itself with attractive filenames and
place them in the shared folder of the P2P client or will replace the files
present in the shared folder with itself How vulnerable is BitTorrent?
– e.g. VBS.Gnutella, Benjamin Worm etc.
• Case 2
– Answers positively to a proportion of search queries by changing the
name of the corrupted file to match the search query
– e.g. Gnuman
131
P2P-Worm.Win32.Benjamin.a
Pollution Attack
• P2P-Worm.Win32.Benjamin.a (Kaspersky Lab) is also
known as: Worm.P2P.Benjamin.a (Kaspersky Lab), • 1. The peers
W32/Benjamin.worm (McAfee), receive the peer
W32.Benjamin.Worm (Symantec),
Win32.HLLW.Benjamin (Doctor Web) list from the
• This worm uses the Kazaa file exchange P2P network tracker.
to spread itself.
• Benjamin is written in Borland Delphi and is
approximately 216 Kb in size - it is compressed by the
AsPack utility.
22
23. Pollution Attack DDOS Attack
• 2. One peer • DDOS = Distributed denial of service
contacts the • Based on the fact the BitTorrent Tracker has no
attacker for a mechanism for validating peers.
chunk of the file. • Uses modified client software
Pollution Attack DDOS Attack
• The attacker sends • 1. The attacker
back a false downloads a large
chunk. number of torrent
• This false chunk files from a web
will fail its hash server.
and will be
discarded.
Pollution Attack DDOS Attack
• 4. Attacker • 2. The attacker parses
requests all chunks the torrent files with a
modified BitTorrent
from swarm and client and spoofs his IP
wastes their address and port
upload bandwidth. number with the victims
as he announces he is
joining the swarm.
23
24. Current Solutions: Pollution
DDOS Attack
Attacks
• 3. As the tracker • Blacklisting
receives requests for a – Achieved using software such as Peer Guardian or
list of participating moBlock.
peers from other – Blocks connections from blacklisted IPs which are
clients it sends the downloaded from an online database.
victims IP and port
number.
Solutions – TRUST and REPUTATION
DDOS Attack
• Most of the solutions proposed to solve the problem of attacks are
• 4. The peers then based on building trust (and/or reputation) between
attempt to the peers
connect to the • Some of the popular approaches are:
– DCRS - Bit Torrent
victim to try and
– EigenTrust
download a chunk
– XRep
of the file. • These approaches do slow down the attack
Attack illustration
What is Trust? What is reputation?
• Trust – a peer‟s belief in another peer‟s capabilities, honesty
victim
and reliability based on its own experiences.
• Reputation – a peer‟s belief in another peer‟s capabilities,
Who has the files? honesty and reliability based on recommendations received
Tracker from other peers.
clients
– Reputation can be centralized, computed by a third party or it can
Discussion be decentralized, computed independently by each other after
forum asking other peers recommendations.
Victim has the files!
.torrent
.torrent
.torrent
.torrent
.torrent
.torrent attacker
24
25. What is Trust? …….. An Example Trust Management System
• Both Trust and Reputation are used to evaluate a peer‟s (BitTorrent)
trustworthiness.
• Trust and Reputation increase or decrease with further • Debit-Credit Reputation System
experience. • Each client calculates a local trust
• Trust and reputation both depend on some context. score for their peers Based on valid
pieces uploaded /downloaded
• For example:
• Tracker combines these individual
– Mike trusts John as his doctor, but he doesn‟t trust John as a scores to make a global score
mechanic who can fix his car.
• In the context of seeing a doctor, John is trustworthy
• In the context of fixing a car, John untrustworthy.
DCRS… …(cont’d)
What is Trust Management ?
Local Trust Score Computation
• “Trust Management” was first coined by Blaze
et. al 1996 Fij=Uij-Dij,
Uij – the number of chunks that i uploaded to j,
– a coherent framework for the study of security
Dij- the number of chunks that i downloaded from j
policies, security credentials and trust Using Fij, the local trust score LTij is computed as
relationships. -1 if bogus chunk is uploaded by peer j
0 if Fij >t
1 if Fij <= t, where „t‟ is the fairness threshold
Reputation Management DCRS… …(cont’d)
• Need for trust mechanisms Global Trust Score Computation
– To assess trustworthiness of peers and the content • Global Trust Scores are a representation the rest of the
• Malicious peers generate unlimited number of inauthentic
swarms opinion of a peer.
files • At regular interval the tracker receives the local trust
– To deter malicious behavior scores of peers in the swarm.
• Reputation is an assumption that past behavior is • The tracker chooses „k‟ , where „k‟ is < the number of
indicative of future behavior peers in the swarm, random local trust scores for peer j in
the swarm.
• Use of reputation to build trust
• Tracker uses k local trust scores for peer j and sets the
average of them as the global trust score for j
25