Sources of information
(TOP)There are four sources of information that are invaluable when measuring website performance: server logs, usage testing, user surveys and clever structural design. We will touch on the first and last of these.
Server logs
Server logs are the most common source of web usage information. A log will typically contain information on the page requested, the users IP address, the web browser they used and possibly the operating system along with the time and date. Each time a user loads a page on their web browser the server will create an entry in the server log.
High usage websites will have server logs that are millions of lines long and require continuous archiving to avoid huge files. Make sure that you keep these archived logs. They are an invaluable source of information over time and cannot be reconstructed if deleted.
are the most common source of web usage information.
A log will typically contain information on the page requested, the users IP address, the web browser they used and possibly the operating system along with the time and date.
Each time a user loads a page on their web browser the server will create an entry in the server log.
High usage websites will have server logs that are millions of lines long and require continuous archiving to avoid huge files. Make sure that you keep these archived logs. They are an invaluable source of information over time and cannot be reconstructed if deleted.
When analysing logs it is important to correlate that information with external events and occurrences. These can be regular in nature, like weekends or public holidays or they can be one-offs occurrences like the start of an advertising campaign or natural disasters like the Canberra bushfires.
You can also compare between comparable sets of data. For example, compare this year with last year or compare usage on weekends with usage during the week. All of these comparisons can give you hints about performance and user profile.
The graph shows the number of pages viewed per day on the
allhomes.com.au website since its launch.

Allhomes.com.au is the Canberra regions premier real estate website. About 95% of real estate agencies in Canberra and surrounds participate and each week there are about 2500-3000 properties advertised for sale. Agents are responsible for updating all of their listing information and our content management system has about 1000 content contributors.
You can see that the site has been experiencing steady growth since launch. However, there are also two really significant drops in usage. At first glance, these drops are so defined that they should be reason for panic, but when you correlate them with the holiday season and compare 2001 with 2002, you see that they are a repeating seasonal variation in usage
You can also see that after the Christmas period there is a second dip in mid to late January. This is particularly pronounced in January 2003.

The accentuation of the dip we see in January 2003 is due, of course, to the Canberra bushfires. If we analyse that part of the graph in more detail and overlay an appropriately scaled view of the 2003 data compared to 2002 data we see that this drop in usage is indeed due to the fires.
We have reached this conclusion because the drop correlates with the date of the fires but more importantly the overall trend through January to March is very similar from 2002 to 2003 except for this one event.
Server log caveats
Caches and proxies
Web and proxy caches play havoc with the accuracy of web log entries. 500 users in an organisation may look at a web page through their departmental cache but only the first page load will appear in the log. Subsequent hits are sent straight from the cache and not the source website.
Accordingly, in this modern time of firewalls, caches and large LANs your server logs will always be registering less hits than what are actually occurring.
Don't analyse in isolation
Take care not to over analyse the log information and certainly never analyse the information in isolation. A good example of this audience size. Each type of information has a different sized audience. When comparing the success of one part of a website with another some scaling should be attempted based on the relative size of the target audience.
Accordingly, relatively low hits may still mean that you are getting a large percentage of your target audience where the size of that audience is small.
Clever structural design
Good structural design is critical to producing a useable website. However, it is also a critical part in gathering information on usage and performance.
What we need to remember is that for the most part the only information we can gather is when a page is loaded and when a form is submitted to the server for processing.
Accordingly, if you embody a lengthy process in a single web page then your server log will only show you when the page was loaded, whether it was submitted or not and how long it took to submit.
However, if you take that lengthy process and divide it into steps that are significant from a hit logging point of view you will gather much better data on how users are negotiating your online process. Principally, you will gather data on how long users are taking between steps and where users are abandoning the process
Process failure
Splitting processes into key steps gives us a great way to analyse process failure and for our purposes data on process failure is more important than data on process success.
Failures are critical to the refinement of any process. Where users abandon a process is vitally important feedback that allows us to perform additional analyses around those failure points.
Search forms - a special mention
This is also the case with search forms. It is important to keep a record of those searches that produce no results. Why does everyone from the United States search my website for "ketchup" and find nothing when I have hundreds of varieties of tomato sauce. If you were not logging the failed search words you would not know this valuable information.