All-Pairs-Pings for PlanetLab
- Jeremy Stribling
- strib@mit.edu
--------------------------------
12/12/05: All-pairs-pings is no longer an active service, as of 12/1/05. It
was finally starting to reach its scaling limitations on PlanetLab. However,
I will keep the archived data online here for as long as possible.
Chad Yoshikawa at the University of Cincinatti has reincarnated the service
as an all-sites-pings data set. Please check out that site at
http://ping.ececs.uc.edu/ping/ .
--------------------------------
This page provides current and archived all-pairs-pings information for
PlanetLab nodes. It is updated approximately every half hour, depending
on the number of PlanetLab nodes and the number of errors. The format
of the .app files is as follows:
Line 1: The date and time this information was collected (i.e. 22:31:21
10/09/2002). This is Eastern Time (GMT +5:00) for data colected after 4 pm
September 1st, 2003, and Pacific Time (GMT +8:00) for data before older than
that.
Line 2: Over how many pings this information was averaged per pair of
servers.
Line 3: A list of the N PlanetLab nodes for which this file contains
information.
Lines 4 through (N+3): The pairwise ping times themselves, as an NxN
matrix of min ping/av. ping/max ping tuples. For example, line i includes
the pairwise ping times for the node (i-4) as listed in the array on Line
3. The jth entry on line i contains the ping tuple for node (i-4) to node
j. All ping times are given in milliseconds. See below for the format of
errors.
The most current .app file collected can be found at current.app. Note that
I've tried to make this point to the most complete current data. That is,
sometimes nodes are slow in sending back data, so the most current file may
not contain everyone's data. Over time, the archived versions of data sets
will be updated as the information becomes available.
Archives for all the .apps that I've collected are put in the directory named
after the month and the day the information was collected, i.e. data
collected on Feb 13, 2003 is in the directory named 2003-02/2003-02-13.
Each file in this directory is a gzipped .app file, named after the day
and time it was collected (Pacific Time). For example, data collected on
Feb 13, 2003 at 4:07:57 pm is named as 2003-02-13--16-07-57.app.gz.
The nodes for which measurements are taken are those listed on the
production node list on the PlanetLab website, unless InfoSpect has
identified crippling problems with them for more than 5 iterations of the
app program.
ERRORS:
If an error occurs while collecting data, those errors will appear in the
.app file in the following format:
-- If the pings to node y from node x failed, node x will perform a
traceroute to node y and, instead of recording a ping tuple, will record
the last hop of the traceroute, as so: ***last.hop.ip.address***. If no
traceroute was possible (i.e. Internet2 node -> Internet 1 node), it will
say ***no_traceroute***.
-- If the centralized conroller sees that node x sent back no data for node y,
then node y's entry on node x's line will read ***no_data***.
-- If the centralized controller receives no data from node x for a given
time period, node x's line will read: *** no data received for
node.x.ip.address *** Note that in the most recently collected data sets,
this may appear because node x was just slow or unsynchronized in sending
its data back; this line can be replaced with actual data at any point in
the future, if node x ever sends it in.
// The following three rules apply only to data collected before 9/15/03
-- If the centralized app controller could not ssh into node x to start
the app process, it will attempt to traceroute to node x, and will record
the following message on node x's line of the app file: *** ssh
ping_times.pl failed on node.x.ip.address, last hop in traceroute was
last.hop.ip.address ***
-- If the centralized app controller could not ssh into node x to start
the app process, and it traceroutes all the way to node x, it will record
the following message on node x's line of the app file: *** ssh ping_times.pl
failed on node.x.ip.address but traceroute was successful ***
-- If the centralized app controller could ssh into node x, but it
takes too long for the ssh process to return (as defined by a parameter in
my scripts currently set to 1 hour) it will record the following message
on node x's line of the app file: *** ssh ping_times.pl timed out after
3600 seconds on node.x.ip.address ***
// end deprecated rules.
DOWNLOADING:
If you'd like to download the whole directory tree of .app files, Ningning Hu of
CMU offers these tips for using wget:
1) get the whole pl-app subtree:
wget -r -l 20 -x -nH --cut-dirs=1 -np http://www.pdos.lcs.mit.edu/~strib/pl_app
"20" here can be any large enough number, just to make sure we can
dig out the whole subtree;
"-r -x -np" are necessary;
"-nH --cur-dirs=1" are just for convenience
this command will download all the data files into the local dir
"./pl-app", but including some non-data files ("\?*" & "*.html").
2) then remove the non-data files from "./pl-app"
find . -name "\?*" -exec rm {} \;
find . -name "*.html" -exec rm {} \;
done.
Just FYI, not sure this is enough for everyone.
CAVEAT:
Feel free to contact me if you have any problems with this. Note that
none of this is currently supported by PlanetLab, and I reserve the right
to stop maintaining this database at any time.