Handbook on cultural web user interaction
First edition (September 2008)
edited by MINERVA EC Working Group "Quality, Accessibility and Usability"

2.6       Audience measurement on the Internet

The aim of this section is to review the techniques and metrics used for audience measurement in the Web.

By audience measurement we mean the methods used for calculating how many people form part of an audience, that is a group of people reached by a message (television programme, advertising, multimedia content, written texts, etc.).

The main aim of audience measurement is therefore to quantify the public, not just in terms of numbers, but also in terms of socio-demographic characteristics (sex, age, study title, geographical area, place of use, taste, behaviour, etc.). On the Internet, audience measurementis carried out for a variety of reasons, among which are:
1.   programming analysis: we must know the characteristics and behaviour of the users in order to satisfy their requirements;
2.   social research: public institutions must monitor the means of mass communication (and therefore also the Internet) in order to understand the role carried out by the media in involving citizens;
3.   advertising sales: a lot of media are supported by advertising and it is not unusual for portals of cultural interest to be sustained by the contribution of advertising banners;
4.   product sales (ticket sales, bookshops, photographs, etc.): it is essential to determine the nature of one’s user population and to study its evolution.

Certain special aspects of the network must however be considered. Within mass communication, the Internet represents a real revolution: in traditional media, communication is from one to many whereas there is a two-way relationship between the user and the web, a constant dialogue between content and navigator. The interface of all the new media (web messenger, chat, blog, etc.) must also be taken into account. These are designed for viewing content and interacting with them in different ways. On the one hand, the user expresses his intentions; on the other, the system, through the aid of technologies in constant evolution, responds to the actions of the user.

Many professionals find themselves using methods of measurement. Among these we find producers of online content, computer industry professionals, public administration and e-government operators, webmasters, those in charge of commercial strategies in the network, etc. Techniques for audience measurement change over time, both as a result of technological development and in response to the evolution and use of the network.

In order to become familiar with these subjects, below is a list of the terminology currently in use on these themes, which are quite extensive and diversified. Literature does in fact distinguish between:
                        1.   systems based on: a) census-based measurement, in which the measurement is made on the entire collection of information received, without any panel plans or statistic projections; b) panel-based measurement;
                        2.   systems that take measurements based on the network infrastructure (site-centric / server centric) or on the user (user-centric);
                        3.   measurement based on server experience or on user experience;
                        4.   measurements based on devices for access to the net (device-centric) or systems aimed at visitors (visitor-centric);
                        5.   tools of web analytics and audience measurement services.

Up to now, the quantitative analysis of the audience, even within the sphere of other media, has used two approaches:
                        1.   measurement of media consumer behaviour with automatic and passive instruments, that do not require the user’s direct involvement;
                        2.   information gathering through interviews or questionnaires presented to users.

Starting from the 1990s, with the new potential of Internet and the web, these two approaches have undergone significant evolution. Automatic data measurement, that is the gathering and analysis of online traffic, is now generically indicated with the term of web analytics, a concept that includes the capacity to record server webs (logging capabilities), the technologies for “tagging” the digital contents (tagging technologies), the possibilities for “sniffing” the traffic in the network (network sniffing), in other words a series of technologies that use different data sources for detecting and analysing the manner in which users are active on the web.

On the other hand, gathering data through more traditional investigations (questionnaires, etc.) has in its turn evolved, by taking advantage of the interaction potential of the web.

In no way can it be affirmed that one method is better than another; the methodology to be used needs to be considered on a case by case basis, taking into account the information requirements and available resources.

One of the fundamental issues is that regarding the process used for data gathering:
                        1.   in census data modality, the measurement is carried out on the total reference population;
                        2.   in sample data modality, the measurement is done on samples of the population.

Of course, the costs associated with measuring and analysing data means that sample data measurements are often preferred to census data measurements.

2.6.1      Census data measurements: web analytics

The term web analytics refers to the study of the behaviour of the network users.

This system of census data measurement does not entail the direct involvement of the subjects to be measured.

A classification of these measurement systems can be made on the basis of the data sources of the behaviour of the users on the web:
                        1.   server based measurement: the web server records (logs) the requests of access to the pages of a website, and analyses them;
                        2.   browser-based measurement: in this case the measurement is done by the client’s computer. The browser calls up the pages of a site hosted by a web server and these are “labelled” with a form of tag through sophisticated technology. This technique is founded on the convinction that measurements must be made in the closest point to the place and at the moment of effective media consumption of the site’s resources, that is to say the instant when the browser loads and therefore visualizes the web pages on the client’s screen;
                        3.   network based measurement: measurement takes place at the level of the proxy servers of the Internet service providers that sort the requests of resources of the various clients and direct them towards a web server that hosts a site. In actual fact, all the pages consulted by the users are gathered and analysed by these intermediate network nodes. Sophisticated tools “sniff” the page requests and the resulting data is processed.

Various web analytics software and services are available commercially as open source (for example, Google Analytics, Shinystat, AWStats etc.).

2.6.2      Sample or user centred measurements

The methodology of user centred measurement derives from “recording the activity of a sample of Internet users who are recruited to be representative of the entire universe of Internet users. The behaviour of these subjects is subsequently projected in order to estimate the behaviour of the entire population of Internet users through opportune statistical projections” (Australian Technical Committee for the Internet Industry).

There are two fundamental elements on which a “user centred” measurement system is based: the identification of a sample of Internet users that is representative of the whole, and the actual monitoring of the online behaviour of these individuals.

The sample chosen is usually described as a panel (according to Wikipedia, a panel is “the ‘quantity’ chosen on the basis of representative criteria, used for the statistic measurement of a specific universe. It is usually a group of persons or families included in a sample investigation”).
The choice of a sample of a population can be made in a number of different ways; the sample must however reflect as far as possible the total population from which it was chosen, precisely so that the partial information can be subsequently applied to the total of the population investigated. Generally speaking, it is usual to prefer a choice of “casual samples” or “probable” samples that do not feel affected by the influence or inclination of the analysts. The panels are therefore representative statistic samples of a certain sphere, on which a certain number of more or less continuative or distanced-over-time measurements are made. The advantage of these measurements is that they provide data on the evolution of particular phenomena over time.

So how does one recruit a panel? Generally speaking, research institutes adopt various techniques, among which telephone research with random dialing of numbers or personal interviews with person to person meetings. More recently, recruiting systems have used the post or the web, even if the latter risks skewing results due to the exclusion of non-Internet users.

User-centred measurement methodology (by panel) is strongly recommended during the design of websites that are accessible to all user categories, including the most disadvantaged.


For example in Italian national rules: having defined the objectives of the web product and experimented with alternative solutions, you proceed to an evaluation with the client in the use context, to then carry out any corrections and updates, to be submitted to progressive monitoring.

“This methodology is based on four main conditions:
                        1.   the formation of a representative group of users or a panel in which there must be users with different types of disabilities and also the different roles and reasons for which a user is interested in entering the site;
                        2.   the construction of use scenarios: define contexts, reasons, and ways of interaction with the site. It is on the basis of these scenarios that the site is imagined, designed, developed and updated and improved;
                        3.   the evolutionary design: the site is submitted to an evaluation by the panel on the basis of a number of complex scenarios. The evaluation aims to define new requirements and new purposes. The definition of the new purposes should be done interactively through the production of prototypes, such that they make it possible to evaluate solutions, identify limits and establish feasibility. A constant conversation with the panel allows us to have an in-progress evaluation of the solutions and gives an advance idea of the final evaluation of the project. Finally the panel becomes an observatory of the use of the site and thus contributes to its continuous update and improvement;
                        4.   monitoring: as we have already said, because it is important to ensure that the site content does not remain static, there must be continuous monitoring, to identify opportunities for its improvement in terms of how well it meets user requirements. The formation of the panel is therefore a central element of the methodology because it guarantees a level of realism, but also of consensus and communication on the project. From this viewpoint it produces data and ideas and makes it possible to make empirically founded decisions. From this last viewpoint the panel is a place for experimenting with opportunities, but also with the limits of dedicated technologies for access and interaction.”      The meter

Generally used in traditional media, the meter is a device for measuring quantities. It is downloaded as software and installed in the computers of the subjects to be monitored. The idea of applying this monitoring technique in the sphere of informatics dates to 1994, when it was used by a group of researchers for measuring the spread and use of application software.

This measuring system entails a minimum involvement of the subjects to be analysed, so that in contrast to to server-centred and browser-centred systems (devices-based measurement), aimed at analysing “machine users”, it is effectively “user-centred” (user-centric measurement). It is no longer the machines and their software that is monitored, but the single individuals with their social-demographic and behavioural specificities.

Fundamentally there are three processes on which monitoring software by meter placed in the PCs of the panellists is based: univocal identification of the individual that navigates; recording of the information regarding his navigation route; sending of the recordings to the companies who asked for the measurement analyses.

Measurements based on panels with a meter are currently the best for gathering data on the navigators’ profiles, the ranking of sites and audience fluctuations among sites (source&loss).

Measurement operations by panel and meter are basically the following:
                        1.   definition of the target and behaviour to be measured (for example, individuals between the ages of 20 and 40 who have used Internet and digital application media from home in the last six months);
                        2.   quantification of the size of this population;
                        3.   panel recruitment;
                        4.   data gathering by meter;
                        5.   expansion of the data gathered on the total population.

The advantages of this type of analysis are:
                        •     social-demographic profiling
                        •     competition analysis
                        •     fluctuation monitoring (source&loss)
                        •     grouping by sectors
                        •     measurement of the pages in memory cache
                        •     identification of unique users
                        •     automatic exclusion of non-human traffic

The disadvantages are:
                        •     measurement only from certain places
                        •     measurement limited to sites with significant traffic
                        •     considerable investment
                        •     data measured not always proprietary
                        •     difficulty in profiling parts of the limited traffic sites      Standardized interview – Static textual questionnaire (see also 3.2)

The most extensively used method of investigation of media audiences, including those of websites and portals, is a standardised interview. This is done by asking structured questions of all users or of a group of chosen individuals. This measuring system entails the direct involvement of the subjects to be analysed. The aim is to investigate their preferences, habits and behaviour, in order to verify effectiveness in terms of user satisfaction with choices made and to study behaviour during network navigation – in other words, to build a “profile”.

The choice of those to be interviewed can be casual or not casual, according to whether or not the choice of those to be interviewed should be probabilistic or not.

The following are among the casual methods for choosing a panel:
                        •     entertainment survey or polls: short requests for judgement on various subjects
                        •     unrestricted self-selected survey: usually present on portals and sites with a lot of traffic, they are invitations to participate in a survey
                        •     volunteer opt-in panel: self-proposed volunteers recruited through sites and portals who, after being registered and profiled, are subsequently contacted at the beginning of the actual investigation.

Among the non-casual methods for choosing a panel are:
                        •     interviews intercepted among a site’s navigators (intercept survey): questionnaires completed by the visitor and randomly selected
                        •     panels based on lists of known names (list-based sample): more or less detailed questionnaires submitted to lists of users with Internet access (for example, those registered with a newsletter, or with a library, the friends of a museum, etc.)
                        •     pre-recruited panels: recruitment of users that are not self-chosen or volunteers, but chosen with probabilistic sampling methods.

Interviews can be made by telephone or in person, sent by e-mail (e-mail survey), or filled in online using graphical user interface elements such as menus and radio buttons (web interviewing, web-based survey).

An online questionnaire can be submitted using the technology offered by the web (at the time of entering or leaving a site, the specific amount of time spent on a site, specific navigation behaviour, every access to the site, etc.), but it is important that a visitor is not constantly exposed to invitations to participate in an investigation.

Normally an online questionnaire can be viewed on a full screen or inside various sizes of windows (pop-up). It is formed of a series of questions posed in different ways (open, closed, single, multiple, etc.) and with which the user can interact through interactive graphic solutions (buttons, drop down menus, boxes, advancing arrows, etc.).

In order for the interview to be effective and to have high quality results, it is suggested that simple language is used and that a certain amount of care is dedicated to the functionality and aesthetics (look and feel) of the questionnaire. It is advised that the duration of the interview be communicated in advance, and kept to a minimum: we should remember that the users are being asked to dedicate some of their time to helping us!

The sequence of the questions should be coherent and dynamic, preventing multiple answers, up to the conclusion of the interview and the final thank-you page. The thanks can be confined to two lines written at the end of the questionnaire, or to an e-mail of reply with a text along the lines of: “Thank you! Your email has been sent to us successfully...”

The aim of a pleasant presentation and an effective structure is the achievement of a greater number of completed interviews, minimizing the rate of refusal or incompleteness. The questionnaire can be a chance for winning over a user by registering him with the newsletter or for recompensing him for the energy spent in compiling it, “making him a present” of resources that would usually be subject to payment or reserved (digital and non digital gadgets, subscriptions to exclusive newsletters, the privilege of cooperating in the content creation, etc.).

The results can be used through statistical processes or by extracting individual suggestions for changes to the web application.

Once the analysis has been completed, it is advised that the results of these surveys (website feedback survey results) be published with an indication of the number of questionnaires analysed and the suggestions that will be implemented or which have already been implemented on the web application.

The following are the advantages of this type of technique:
                        •     limited costs
                        •     rapid planning and completion times
                        •     capacity to reach users regardless of geography
                        •     possibility of using multimedia content (audio and video)
                        •     control of the processes in real time.

What is really critical is the truthfulness of the statements provided by the interviewee regarding his user type, a question that has been rendered even more complicated by the new reality of Internet that witnesses the increase of “virtual beings” (role games, chat, avatar, nicknames, etc), although studies in this sector are not yet consolidated.


2.6.3 Audience metrics


Audience metrics is a discipline which originated within the sphere of advertising and marketing. In the web context, its major role is that of providing qualitative and quantitative indicators for the analysis of web application effectiveness.

Audiences translated into numbers are defined as ratings. Although traditional media measurement systems are by now standardized (for example, average minutes for TV, average quarter of an hour for press, etc.), this is not the case for the Internet and the web. Let us consider the differences.

In traditional media, the relations between the medium (TV, radio, cinema, press etc.) and the public (TV viewers, radio listeners, readers) is identified through measurement of the exposure time, without a close study of the motivations and effects of this exposure. Starting from the 1990s, new metrics (e-metrics, web-metrics, net-ratings) were identified for investigating and quantifying the relationship between Internet users and digital content. The new medium is no longer characterized by a simple “exposure” model, but becomes an “interactive” space of action. The viewer is transformed into an active user.

Internet interaction means the use on the web of a space of variable size in which to put data (informative, promotional, advertising, multimedia, etc.) so that the user is not confined to just looking at them but is encouraged to interact by providing a form of reply. This strategy can be defined as a Call to action, in the sense that the contents are placed in the network inviting the user to “do something!” and also specifying “what to do”. Closely linked to this idea there is also the so-called “funnel process”: the user is involved in various processes that bring him from being a simple visitor to becoming involved in more interactive procedures, thus reacting to the communications to which he has been exposed. Of particular importance in the sphere of e-commerce, this process thus makes it possible to verify in real time the effect of a communication on users’ behaviour.

There are many forms of interaction: clicking on a banner, filling in a form or questionnaire, making a purchase, downloading applications, participating in a community tool, inserting files, using a content collector, etc.).

Within the sphere of exposure metrics, the following are the main indicators, each one of which is in its turn connected to more specific metrics:
                        •     Impressions:number of banners inserted, fixed or variously animated, seen by the user
                        •     Page views: number of web pages requested and viewed by the user
                        •     Visits or sessions: number of visits to a site made by users. By visit or session we mean the                                viewing of a series of pages by a user without there being a period of over thirty minutes of                                inactivity between one page and another.
                        •     Unique visitors: number of single users that have visited the site, net of duplications.
                        •     Time spent: time spent in minutes and seconds while navigating or viewing the pages of a site or                                using a digital application.
                        •     Frequency: average number of visits to a site or of use of a digital application by a single                                individual

Within the sphere of interactivity metrics we can distinguish between:
                        •     passive exposure, that can be investigated by calculating page views, and active exposure,                                identifiable with the click action;
                        •     metrics for monitoring the use of the content (content metrics)and metrics for monitoring                                activities connected with e-commerce (commerce metrics).

Interactivity metrics are used especially in the marketing sector, but they can also be important in the sphere of cultural web applications. The most important indicators are:
                        •     Click-through (the act of clicking on an announcement or banner): absolute number of click                                actions carried out during a promotional campaign, which in the field of marketing is also linked to                                the concept of pay by click (payment for the clicks generated).
                        •     Click-through rate: within the sphere of a single campaign, relation between clicks generated by a                                banner and the total viewing (impressions) of the same announcement
                        •     Conversion: successful completion of the phases of a process aimed at a network result (e.g.                                subscription to a newsletter, download of media, purchase of a product, etc.)
                        •     Conversion rate: relation between the conversion operations carried out with success and the                                total of potential conversions
                        •     Interaction rate (time spent in interaction): average time spent in interaction with an                                announcement.

Obviously, to make a correct analysis of the results, the indicators that analyse the negative results must be considered:
                        •     Abandonment rate: percentage of processes not concluded with respect to the number of                                processes begun
                        •     Churn rate (cancellation rate): percentage of cancellations (for example, to a newsletter)
                        •     Bounce rate: percentage of missed deliveries (for example, of e-mails).

Within the sphere of interactions, it is important to understand other concepts:
                        •     Enquiry (information requests): number of information requests sent directly from the user via                                the web
                        •     Lead (profiled user): concept that indicates that a user has provided information on his network                                preferences
                        •     Search (information search): number of searches that the user makes on a site using a search                                system that is internal to the web application
                        •     Registration: number of registrations made by users for access to information or services
                        •     Order: number of orders made by users for purchasing products or services.

Although they are not studied in depth in this context, we should not forget the indicators of communication costs, or the investments made to generate attention (costs for exhibiting information) or to generate action (costs for enticing interaction).

2.6.4      Log file analysis

Servers that host web applications send users textual content, images, multimedia files, etc. In order to increase user satisfaction, their navigation pathways can be carefully monitored and analysed. Standard web server functionality includes the capacity to collect and store detailed information about web server activity. Every provider of server solutions includes his own systems for logging, which collect detailed information on the use of the site. This can then be analysed from various perspectives, in order to extract useful information for various roles (technical, scientific, research, marketing).

The interpretation of network traffic makes it possible to extract indicators relevant, for example, to the number of accesses, the navigation routes, behaviour models, technical configuration of the devices used for connecting, etc.

The web page requests made by users are stored in the form of log files that record the activities ofteh web server. Due to the considerable size of this type of file, their processing is usually carried out by specific software called log analyzers. These classify logs by type, that is to say access logs (page requests that reach the server with the time and date of the request as well as the IP addresses of the computer that requested the resource and the name of the resource requested by the user); error log (recording of the malfunctions or failures in the handling of resource requests); reference log (recording of the URLs from which the user comes, the search engines used and the key words used); agent log (recording of information regarding the browsers utilised by the end users).

The analysis of the log files must however include wide margins of approximation.
The advantages of this type of analysis are numerous:
                        •     special hardware or software installations are not required
                        •     both current and historical information is always available
                        •     all events that happen in the server are recorded
                        •     even low traffic sites can be analyzed.

The most obvious disadvantages are the following:
                        •     difficulty of standardizing the metrics
                        •     an significant proportion of the online resources visited escapes the logging process. This is due to                               difficulty in measuring dynamic pages; as well as failure to measure traffic from client-side memory                               or cache. Recently some Content Management Systems (CMS) include watchdog modules to                               monitor the web site, capturing system events in a log to be reviewed by an authorized individual at                               a later time. The watchdog log is simply a list of recorded events containing usage data,                               performance data, errors, warnings and operational information. It is vital to check the watchdog                               report on a regular basis as it is often the only way to tell what is going on
                        •     difficulty of calculating the time really spent on a page (the request of a resource does not                               necessarily involve its viewing)
                        •     lack of social-demographic information
                        •     lack of information on competition
                        •     lack of certification of information taken from third parties, since all the processes are managed                               directly by the software platforms.

2.6.5      Protection of privacy

Online audience measurement may acquire personal data regarding network users.

When individuals are interviewed or metered, they should be warned beforehand of the on-going audiometric procedures so that they are aware how information is being gathered about them. In the case of web analytics techniques, the recording of information is automatic and the individuals “measured” are not made aware of the procedures taking place.

In both cases the legislation regarding rights to privacy and protection of personal data must be taken into consideration by any organisation which gathers web usage information.

The Directive number 2002/58/CE of the EU regarding the treatment of personal data and protection of privacy in the sector of electronic communications4 foresees that the use of data gathering tools must be made known to the subjects being monitored so that such gathering is transparent. At the same time, the Directive recognises that such tools can be considered legitimate for improving online services.

The World Wide Web Consortium (W3C) proposes the use of a standardized solution known as P3P (Platform for Privacy Preferences Project)5, a model that allows websites to state how they mean to use the information gathered from their users.

