Day-to-Day Tall Head of URL Exploration
This is the 4th post on the statistics of URL exploration. In the previous three (The Long Tail of URL Exploration, What does the Nth Explorer of the Web Find? and The Tall Head of URL Exploration) I looked at how adding users grows the long tail and tall head of URLs for a single day. Today, the data covers 20 days with a relatively constant population.
To get some idea how the tall head evolves, compare the tall head on day 0 with 19 successive days. The plot below shows the Top 10, Top 50, Top 100, Top 500 and Top 1000 URLs for day zero and the fraction of the top URLs on day zero appearing in the tall head on the nth day.
Green-Top 100 URLs; Cyan-Top 500 URLs; Yellow-Top
1000 URLs. (URLs ranked by visits).
While the Top 10 and Top 50 URLs show stability day after day, the Top 500 and Top 1000 roll over at a fairly constant rate after day one. The plot can be used to estimate the size of the persistent tall head of URLs for this population and the rate at which the tall head evolves.
First, look for a change in behavior from maintaining a constant fraction of the day-0 URLs to a steady decline from day to day. By this heuristic, estimate the persistent tall head to be between 50 and 100 URLs.
Secondly, to estimate the turnover of the tall head, choose the approximate desired tall head, e.g., the Top 500 URLs (cyan), and look at the slope of the line for days 1-19. (Alternately, choose a timescale for which the tall head should turn over to a given fraction remaining, say, 75%, giving a timescale of approximately 15 days.)
for comparison; Green-Fit to Top 500 URLs. (Days 1-20,
URLs ranked by visits).
The plot above shows the Top 500 URLs rollover about 0.5% per day from days 1 to 20.