Writers web watch

Web metrics


 
The website for writers
WritersServices has over 2000 pages
To help you find
Search
Contents
Avoiding web hazards
Tips & technicalities
Web how-to
Making most from the web
Web history & issues

Home
Up
What is a widget?
What is Metadata?
How does email work?
Email Q & A's
Multiple emails
Rules Wizard
WiFi
RSS
Pictures
Scanning
Computer Ports
Summon a page
Header info
Web metrics
Measuring traffic
Bandwidth

 

 

 

 

The realities of measuring web traffic

The music and book publishing businesses are famous for the way they do their sums. Headlines scream about the size of advances given to their stars. The reality, we all suspect, is that the figures are the product of some creative minds and inflated accounting.

The web, on the other hand, with all its digital technology must surely be a place where fact and figures cannot be falsified, spun or inflated? Wrong. The various measures of traffic and their significance are still hotly debated.

Hits

In the beginning, there were ‘hits’. Today, hits are discredited as a useful measure of web traffic. Hits count the number of files served up when you summon a page. A content-rich text page might generate just a single hit, while a busy screen full of graphics and buttons might generate 20 hits. So a crude hit count is a bad way to measure meaningful traffic.

Web metrics

The talk is now of ‘web metrics’. Page views, new or returning users, and advert impressions are the sort of statistics that the business managers of websites want. If you run a high-street shop, you can observe customers’ reactions as they browse and work their way round your store. At present, few e-businesses can monitor their visitors and they therefore have little or no idea what potential customers are doing when they browse their websites. So, the need is strong, but can you collect these figures?

Measuring the traffic

Don't be fooled by the precise-sounding terminology and numbers quoted by websites. There is little agreement about how the raw data should be processed. Site-traffic measurement therefore remains part-art, part-science.

Part of the problem stems from the three different places where web data can be captured. You can measure traffic at the server which supplies the pages, or at the Internet Service Provider (ISP) that passes them on to the user. Finally, there are also ways to measure traffic at the browser itself.

Most of the logging is done at the server. All vendors of web-servers such as Apache, Microsoft Internet Server and Netscape provide logging with their products. Most logs record one entry for each request to the server.

Logging hits

A request to the server is a ‘hit’ and a hit is the basic item of web logging. All the other metrics come from parsing (i.e. analysing) the hits. Search engines complicate the problem of measuring useful traffic. Their software spiders wander the Internet to update their indexes of pages. Site operators have to filter these visits to obtain an accurate count of the number of real people visiting their site. Spiders are fairly easy to spot though - if the same IP address visits 10 pages per second and works its way through the site, you can be sure it is a robotic crawler – but this provides another source for inflated figures.

So a ‘page view’ is defined as a hit that has readable text. This all sounds eminently reasonable if you are confident that you can distinguish between informative text and a large text advert or disclaimer notice.

Sessions’ or ‘visits’ - the terms are interchangeable – are produced by an algorithm (i.e. a calculation) which looks at details such as the IP address, the referral address and the time-stamp in order to work out if the hits and page views are part of the same visit. The algorithm is therefore making a calculated guess. There is no agreed standard, so visitor figures from one algorithm might not agree with those produced by another.

Session and page views are not the only metric that the hits-log can yield. Similar pages can be grouped using pattern-matching to categorise them. So visits to pages listing books, CDs and software might be identified and placed in separate categories. Aggregation is another processed metric which might allow the site owner to work out how many times people pressed a particular button on a specified page.

Log files

The WritersServices website gobbles up gigabytes of storage each month as all the requests made to our server are logged. Processing this data to extract the web metrics consumes prodigious amounts of processing power and storage space.

‘Data warehousing’ software is beginning to master how to extract useful information from the web. Internet Service Providers are recording a great deal of information about their customers. However, the potential for the data-gathering to become intrusive has led the regulatory body, the World Wide Web Consortium (W3C), to set up a project called ‘Platform for Privacy Protection’ (P3P) to define policies for the collection and legitimate use of web data.

Getting to know you, or not

Every website operator would love to know the identity of their visitors, but web logs track IP (Internet Provider) addresses, not people. To make life more complicated, most Internet Service Providers assign their users a new IP address each time they log on. Most of us are grateful that we have a level of anonymity when we are surfing the web. To track a user, the content-server needs to place some sort of marker on the user’s computer.

Many websites put a small text file on your hard drive. This is called a cookie and acts as your identification card for the web site you are visiting. However, cookies might not be much use any more. A survey by Jupiter Research in 2004 indicated that 58% of web users guarded their privacy and deleted all cookies from their system.

The marketers have responded with Persistent Identification Element (PIE) to track you without using cookies, a technology that uses Macromedia's Flash MX. If you go to a PIE-enabled website, your browser is tagged with a Flash object just like a cookie! Consumers have learned to delete cookies but most are unaware of shared objects or how to disable them.

What next?

Web businesses could follow the example of music and book publishers and conceal more than they reveal. Alternatively, site owners could opt to pursue the example set by the newspaper and periodicals publishers and subscribe to an independent audit procedure. It will say much for the maturity of the web business if they choose the path of virtue and honesty, but don’t hold your breath.

©Chas Jones 2005

HTTP status

If a user successfully loads a page in their browser, the server will log a status code of 200, indicating success.

Codes in the 200-299 range indicate success,
300-399 indicate a warning,
400-499 indicate client error and
500-599 indicate a server error.
Glossary of terms
Browser: the software used by a visitor to interpret the HTML and handle web pages.
Domain: the unique name that identifies an Internet site.
Entry   resource: the first resource viewed as part of a visit.
Exit resource: the last resource viewed as part of a visit.
Hit: a file such as a page or graphic that make up a web page.
Page view: one deliberate request to a given URL. This calculation is an approximation based on the time and sequence.
Platform: the operating system used by a computer attached to the Internet.
Referral: the URL resource from which a visitor requests another resource.
Resource: an item that can be requested by a web browser.
Return code: the result status of a HTTP request that indicates the success or failure.
Visit: a continuous period of activity by one visitor to a website. This measurement can also be referred to as a session with a 30 minute maximum.

 

How-to Index  

bullet Tips
bullet Simple how-tos
bullet Issues
bullet Technology
bullet Home

Terminological inexactitude? Technical & Publishing Glossaries

WritersServices - The website for writers Services to help prepare your work   

Web Watch
Search
Contents
Site map
Feedback

 ©WritersServices.com 2000-2011