There are many factors that can independently affect user experience on the internet, including - but not limited to - a user's internet connection speed, reliability of the access network, availability and load characteristics of the application servers and in some cases, the configuration of the users' in-home network. Individual and isolated measures like access speed or server capacity do not capture the real user experience. An end to end, application-level performance measurement that includes all the influencers in the equation is the right approach to measure and quantify the true internet user experience.
Presented here is a methodology to rate Internet Service Providers (ISPs) in terms of YouTube video capability, based on sustained application level performance measurements. The objective is to present a rating that is meaningful, easy to understand and one that closely reflects the real world internet experience.
A typical YouTube video playback consists of a YouTube client (player) fetching video bytes in a streaming fashion from a YouTube server (CDN), in one or more requests (e.g. HTTP GET). The first step in determining ISP ratings is to measure the sustained speed at which these video bytes are transferred from server to the client. To measure the achieved application level throughput (goodput), the following are recorded for each request:
Based on these measurements, the goodput for a given request 'R' is computed using the formula below. Each measured request is considered a goodput sample.
Ratings are derived by aggregating relevant goodput samples recorded in the measurements phase. The methodology supports ratings to be computed at various levels of granularity, for the selected dimensions. For example, the rating for an ISP could be calculated for various time slices (e.g. hour, day, week, month) and/or at various geographical levels (e.g. country, province, metro, city).
For a given time period 'T' (e.g. trailing 30 days) and a geographical location 'L' (e.g. San Francisco, CA, USA), the rating for an ISP 'P' (e.g. Comcast) is computed as follows:
GAT Bucket | Goodput Threshold | Reasoning |
---|---|---|
HD (High Definition) | > 2.5 Mbps | Minimum goodput required to sustain an average YouTube HD video playback at 720p resolution |
SD (Standard Definition) | 0.7 to 2.5 Mbps | Minimum goodput required to sustain an average YouTube SD video playback at 360p resolution |
LD (Lower Definition) | < 0.7 Mbps | Goodput too low to sustain YouTube SD video playback at 360p resolution |
3) Define Rating Criteria: The ISP rating criteria is defined in terms of minimum level of GAT volume requirement for each rating level. Since this metric is designed to reflect consistency and reliability of the ISP's network, the bar needs to be set at a level that captures sustained performance rather than typical (average) performance. To that effect we define three rating scales: GAT-90 (90% of requests above threshold), GAT-95 (95% of requests above threshold), GAT-99 (99% of requests above threshold) to reflect different levels of reliability.
The following table defines the criteria used to determine the final ISP rating in our methodology, using GAT-90. The 90% bar is chosen after careful consideration of observed practical performance in the field. The bar will closely follow the evolving network capability over time.
Rating | Criteria (GAT-90) | Reasoning |
---|---|---|
HD (High Definition) |
90+% samples are marked HD |
Network offers consistent and reliable YouTube HD (720p) performance |
SD (Standard Definition) |
90+% samples are marked at least SD |
Network offers consistent and reliable YouTube SD (360p) performance |
LD (Lower Definition) | Neither of the Above | Network offers unreliable YouTube performance |
The ratings are centered around networks, not users. All goodput samples are completely anonymized and no user information (e.g. browser cookies, IP address) is persisted or used directly in the rating algorithm. Furthermore, if the aggregated sample volume for the selected geo level and time interval is below a certain threshold, the algorithm would fall back to using more coarse grained dimensions (i.e. aggregate by broader geo and/or time interval) that meet the minimum size requirements to compute the rating.