Pitfalls and Problems with Internet Data

David Moore
Caida

There are fundamental problems with any Internet data collection
methodology, whether for workload, active or routing measurement.
Additionally publically available datasets may have additional
implementation quirks and problems beyond those neccessarily common
to the general approach. Common problems include granularity of
timestamps, syncronization between locations, inability to measure
atomically, lack of a "typical" location or link, non-global IP
addresses, and tradeoffs between how often you can measure and how
detailed that measurement can be (Heisenberg principle). Commonly
available datasets share these problems but have additional problems
of their own. Some unexpected "features" of publically available
datasets will also be discussed, including those in Skitter,
Mercator, NLANR header trace, and RouteViews data.


Back to Long Programs