Minerva European Conference, Parma. Mecocci


	About Minerva
	Structure
	Partners NRG Working Groups
	NPP
	Good practises
	Competence centres
	Digitisation guidelines
	Enlargement
	Events
	References
	Publications

home | search | map | contact us

Path: Home | Events | Parma | Papers | Mecocci

Italian Semester of Presidency of the European Union

EUROPEAN CONFERENCE OF MINERVA

Quality for cultural Web sites
Online Cultural Heritage for Research, Education and Cultural Tourism Communities

Parma, 20-21 November 2003, Auditorium Paganini

Alessandro Mecocci
(Meta.Com)

Advanced Fruition and Management Systems for Museums

INTRODUCTION

Today's digital techniques based on multimedia, can be interpreted as new means for answering very old needs, namely those of: studying, preserving, documenting the historical events, religions, cultures, architectural structures, human organizations, and habits. In few words the need of propagating memories to contemporary and future generations. In the past, different techniques have been used, that go from the images and draws on the rocks to the perspective representations on paper, till the three-dimensional analogical models. The common goal has been that of giving realistic representations capable of conveying semantic meanings in an intuitive, easy to understand and to remember way. Realism is not the only focus; interactivity is another fundamental aspect that plays a central role in the information and knowledge communication process. A lot of attention has been devoted to the improvement of the user interface (for example by changing it from 2D to 3D) to enhance the exploration and manipulation possibilities.

Realism and interactivity are important aspects in communicating and understanding Cultural Heritage; concerning this, Multimedia techniques show important features that can offer many new opportunities. In particular, two technologies play a dominant role in this scenario: 3D representations (Virtual Environments, VEs), and Wireless Appliances (Was). VEs help in building up 3D documentation of complex Cultural Heritage structures and items. Moreover, VEs facilitate hypothesize and testing, by making it possible to simulate different scenarios and reconstructions (different lighting conditions, different assembling of fragments belonging to cultural items, simulated temporal evolution of Cultural assets from past, to today, to forecasted future). In this sense, VEs promise to be an important technology for answering the diverse needs of dissimilar users (occasional or professional), and to improve the retention of knowledge while deepening the understanding about Cultural Heritage topics. Moreover, VEs are particularly suited for exploiting new metaphors of interaction with users and for improving remote access to worldwide Cultural Heritage. This last point also impacts the preservation aspects (e.g. ultra-high-quality digital-copies can be diffused while the originals are kept safe).

On the other hand, WAs give flexible responses to the needs that generally arise during the visit in Museums or in Archaeological and Naturalistic areas. WAs improve the comfort and the interactivity between the user and the support infrastructure, thus granting a greater freedom of movement while assuring a significant safety and satisfaction degree. Free walking, personalized information and advices, dynamic messaging, real-time wireless multimedia distribution, etc., are some of the new services that can be conceived and implemented.
In this paper we introduce another important idea: Museum Reactivity. This interaction metaphor is strongly related to the concept of Multisensor Social Interfaces (MSIs) that enable a new class of multi-user man-machine interfaces [1]. Museum Reactivity is a completely new metaphor that will be the basis for a new generation of Museum fruition systems.

MUSEUM REACTIVITY

Computing resources in public spaces (e.g. inside Museums) represent a paradigm that differs from the conventional desktop environment and requires user-interface metaphors quite unlike the traditional Mouse, Pointer, Icons, and Windows. In particular, a MSI (an interface between multiple persons and multiple computers in relatively large spaces) must be capable of autonomously initiate, and terminate interactions on the basis of multiple-people parallel-behaviour. Moreover, it must be capable of handling multiple interactions at the same time, and to allocate, divide and deallocate resources among multiple customers in a finalized and equitable way. This approach represents a significant departure from current practices, but it will become increasingly important as computing resources migrate from desktops into public open spaces. It is important to note that there is a perspective change from the single user structured interactions (typical of desktop systems), towards unstructured interactions with multiple customers freely moving in open spaces.

MSIs enable Museum Reactivity: it is no more needed that visitors press buttons or touch screens to start multimedia interactions or presentations. The Museum "sees" the behaviour of visitors through Multisensor Social Interfaces (e.g. Vision based or Pressure based sensors) and then autonomously reacts by affecting the environment (e.g. starting a sound, activating a video or enacting visual events). Because the Museum knows something about what is going on, it can act and react appropriately. MSIs introduce an improved fruition dimension and leave people free of moving without taking care of nothing but enjoying the visit. It is like to have a ghost assistant who is constantly looking for opportunities to help and manifests himself only when needed. From the fruition point of view, the role of the visitor is greatly enhanced with respect to previous approaches; people actually build up the visit and they do this in a social way (i.e. multiple parallel interactions by multiple visitors at the same time, determine how the Museum reacts i.e. the dynamic evolution of what is presented and how it is presented).

This paper describes the integration of Pressure based and Vision based Social Interfaces in the framework of MuseumNet© a product for the creation and management of clusters of Museums distributed over the territory. MuseumNet© is an advanced modular system developed by Gruppo META in co-operation with Etruria Innovazione (a Technology Transfer Center of the Tuscany Region) and with the University of Siena (that developed the innovative parts) [2]. The system evolved from an earlier architecture (devoted to multimedia Museum fruition) that has been designed by the author for the "Nuovo Museo Archeologico" of Bolzano (currently hosting the Mummy of Similaun, claimed to be the oldest mummy in the world). The Museum with its distributed multimedia fruition system has been opened to the public, on 28 March 1998. It has been reviewed by New York Time as one of the top ten Museums recently opened in Europe.

SYSTEM ARCHITECTURE

The integration of MSIs inside the MuseumNet© architecture has been implemented by considering a layered approach comprising three main parts: 1) Control Subsystem; 2) Sensor Subsystem; 3) Actuation Subsystem. This approach allows breaking the description of Social Interfaces (that could be a complex task), into simpler sub problems, and also permits to separate the logical and semantic aspects from the physical implementation.

The Control Subsystem is the principal part, it starts from an abstract description (see below) of the whole set of MSIs that are present in the Museum and supervises all the events that arises in the different physical areas. It dispatches internal messages, starts, stops or kills threads, initializes physical devices and implements the changes in the environment through suitable output devices.

The sensor subsystem takes care of suitably initializing and handling the hardware and the software running on peripheral input devices. Due to the fact that MSIs generally require some processing to be done at the input-signal level to extract useful information (e.g. in the case of Vision based sensing of human behaviour), this subsystem must take care of properly load setup data for each physical device, to schedule appropriate logging and data reporting, to monitor the proper execution of the different subtasks. Moreover, the sensors are described through suitable abstractions that encapsulate device-specific details behind a generic interface and enable a separation between physical and virtual worlds. Each Abstract Sensor provides generic methods for accessing the input values in a uniform way independent from the real physical device. The physical devices are interfaced through low-level special purpose routines. The low-level routines must supply methods to initialize the device, and get data (start up, prompt for a value, get a value, close, etc.) for any input device. Through these routines one or more Abstract Sensors access the physical detectors. Such an approach enables painless integration of new input devices within existing applications. This is very similar to the approach taken in the JAVA-3D APIs [3].

The actuation subsystem takes care of acting on the environment, i.e. it implements the reactions of each Social Interface inside the Museum. This subsystem is very similar to the sensor subsystem mainly differing in its efferent role.

The VirtualMuseumGraph

Each Museum generally comprises more than one MSI, and each MSI insists over a different physical area. To appropriately interact with users, each MSI must be aware of its own environment, i.e. it needs a model of the environment itself. The global description of the Museum uses a metaphor based on Virtual Rooms. Each Virtual Room contains the environment model (the physical space related to that Room) and the model of each MSI that relates to that Room. Each MSI is described by specifying the sensors, and the related "Stages". A Stage is the description of a subpart of the Room environment where an interaction takes place. A Stage contains the information about the kind of interaction, about the conditions that must be met to start/stop the interaction, about the actuators that are used to implement the interaction, and about the exact location of the physical subpart of the environment with respect to the whole local environment (the whole local environment is described in the Virtual Room). By specifying the relative position of each Stage with respect to the Virtual Room environment, it is possible to properly fuse information coming from different Sensors to obtain higher-level descriptions. For example, it is possible to extract the 2D position of people by means of suitable Computer Vision algorithms applied to the images acquired by a camera. Thereafter the data can be fused with those of other similar cameras to extract the 3D location of people with respect to the Virtual Room environment. The transformations are contained in the respective Stages (or in a single stage comprising multiple Sensors).

The description of the whole Museum is given by using a "forest of trees" that has been called a VirtualMuseumGraph. The VirtualMuseumGraph is an abstract representation that comprises one or more VirtualRooms that in turn comprise one or more StageGraphs. Each StageGraph comprises sub-trees of attributed nodes (see Figure 1). Each node can be a group or a leaf and represents an entity in a StageGraph. Leaf nodes can be Abstract Sensors or Sounds or Actuators (actually a Sound is a special kind of Actuator). Groups are used to assemble multiple Actuators or Sounds or Abstract Sensors in a single coordinated unit. For example, a group of lights that must be switched on in sequence, can be described as a single logic unit by means of a group node whose children are leaves each representing the single light of the light-set. Each node or group can have an associated Behaviour. A Behaviour can do anything, for example it can: perform computations, update its internal state, modify the StageGraph, start a thread, send a message, activate an interaction. Multiple Behaviours can

VirtualMuseumGraph abstract description structure

Figure 1 VirtualMuseumGraph abstract description structure

be composed so that independents Behaviours can run in parallel to obtain special interactive effects or complex presentations (e.g. starting multiple sounds while playing video presentation while opening boxes or operating mechanical analogical models). Generally a Behaviour is characterized by two methods: the first is called when the Behaviour is enabled and the second is called when the Behaviour is fired (waked up). The first method can be used to initialize the Actuators while the second performs the actions needed to implement the environment modifications. Not all the Behaviours are enabled at the same time. Normally a Behaviour must be enabled only when a visitor is nearby; this condition is specified by Enabling Bounds i.e. a volume delimited by: a sphere, a box, a generic polyhedron, or their Boolean combination. Obviously a Behaviour can be permanently enabled by making its Enabling Bounds greater than the environment associated to the Virtual Room it belongs to. This possibility allows, for example, to have a Room Behaviour always enabled, this is important for people tracking tasks where a multisensor system is continuously running to estimate visitors' 3D positions inside the Room environment. Even if enabled, a Behaviour can be fired only if some predetermined conditions are met. In particular, FiringCriteria and FiringConditions are connected to each Behaviour. FiringCriteria are used as prerequisite for firing, for example: a number of frames have been acquired and nothing changed, a number of milliseconds have elapsed, another Behavior posts an event, one detected shape collides with other shapes. FiringConditions are used to combine the previous criteria according to Boolean rules (AND, OR, ANDofORs, ORsofANDs) so that complex activation strategies can be more easily defined.
The VirtualMuseumGraph is created by means of a special editor that is used:

to give the actual conformation and dimension of the physical spaces where each MSI operates
to give the description of the Sensors and of the Actuators
to specify the interactions by setting up the appropriate Behaviours and by binding them to the corresponding Stages
to specify the exact position of each Stage with respect to the environment of the Virtual Room which it belongs to

The VirtualMuseumGraph structure is used by the Control Subsystem, by the Sensor Subsystem and by the Actuation Subsystem to obtain the information and data needed for the correct functioning of the whole Museum.

Vision based MSI

To enable reactivity in Museums, some form of perceptual intelligence is needed so that the system becomes capable of classifying the current situation and to appraise the important variables to react in an appropriate and socially acceptable way. This can be obtained, for example, through suitable Computer Vision algorithms capable of detecting and tracking the position of multiple visitors inside a specific area of the environment. In the proposed system, a Vision based MSI has been implemented that is capable of detecting the presence of visitors. Thereafter it tracks them by means of multiple TV cameras. For each camera an adaptive multi-class temporal median filter is used coupled with a colour and shape statistical model. According to the model, visitors are segmented from complex textured backgrounds under variable viewing and illumination conditions. People are modelled by means of blobs whose colour statistic is continuously updated. The tracking algorithm also maintains a model of the background that is represented as a set of textured regions mixed with more homogeneous regions. Different statistical models are used for the two kinds of regions. The regions are described through local-mean colour-values and spatial distribution descriptors. A multi-valued Gaussian mixture is used to account for background spatio-temporal variability [4]. Colour histogram indexing and normalized colour descriptors are used for tracking multiple visitors in real time (see Figure 2).

Example of people tracking

Figure 2 Example of people tracking

The detected 2D blobs are back-projected to obtain 3D position estimation; when more than one camera is available homographic-based multiple-image-fusion is used to improve position estimates. In the actual implementation the flat ground hypothesis is assumed to hold (that is almost always true in Museum applications). Camera intrinsic and extrinsic calibration (needed for 3D position recovery) is obtained by using multiple planar targets during the preliminary set up phase. These data are stored in the corresponding StageGraphs and in the VirtualRoom abstract-description structures. The Sensor Subsystem uses such information to fuse the multi camera data and to back-project the visitors' position over the Room model. It is important to note that the visitors' detection is done in parallel enabling, in this way, the Social Interface principle. A computer can now evaluate which kind of interaction to start and how to react. This evaluation can be done by grouping the visitors and by matching them with the interaction resources that are available in the Room (described by the StageGraphs that belong to the VirtualRoom). Multiple interactions can be started in parallel, targeted to different visitors sub-groups.

Pressure based MSI

Another important sensing and tracking strategy is based on arrays of pressure sensors. This is particularly true in those applications where illumination conditions or the spatial conformation are too severe to apply Vision based people sensing (e.g. if it is needed to implement darkrooms where to start special purpose interactions). In particular, it is possible to implement floating floors tasselled by means of suitable tiles over a bed of pressure sensors. One typical configuration uses four pressure sensors for each squared tile used to pave the ground. In such a way, by reading the pressure values on the four vertices of each tile and by considering the neighbouring tiles (the neighbour dimension depends on the tile dimension), it is possible to estimate the position and the number of persons on the floating floor with sub-tile accuracy. Moreover, in general no back-projection processing is needed because pressure sensors are directly related to physical locations inside the actual environment (see Figure 3).

Events detected by the VirtualRoom Behaviour

Figure 3 Events detected by the VirtualRoom Behaviour

The Control Subsystem

At start-up, the Control Subsystem loads the VirtualMuseumGraph and extracts the whole description of the various VirtualRooms inside the Museum. Each VirtualRoom in turn, contains the description of the physical environment related to the MSIs and of the corresponding Sensors, Sounds, Actuators and Behaviours. These data and information are used to initialize the Social Interfaces and all the physical devices needed to enable the Museum Reactivity. After the initial housekeeping, all the global Behaviours are started. These Behaviours can vary from VirtualRoom to VirtualRoom, so that different tracking and interaction strategies can run in different areas of a single Museum. For example, in a certain area there can be a Pressure based MSI, while in another zone a Vision based MSI. The area that uses pressure sensing is described by the corresponding VirtualRoom abstract structure; it also specifies which are the areas where interactions can be started by the Reactive Museum. The StageGraph contained in the VirtualRoom structure describes the areas. When the global Behaviour of the VirtualRoom, detects some visitors inside a Behaviour Enabling Bound, it starts the Behaviour that corresponds to that Bound and looks for its firing criteria and conditions. If both of them met, the Behaviour is fired and the corresponding Stage is activated. The Stage contains the description of the Actuators and Sounds that the fired Behaviour uses to interact with the visitors. Needless to say that multiple Enabling Bounds can be activated by the same or by different sub-groups of visitors, so multiple Behaviours can be running at the same time. In the same way a specific Behaviour can affect multiple Stages so that multiple parallel modifications of the environment can be obtained (for example it is possible to start a presentation sound or a background sound in a specific place while switching on lights or videos in another location). This is how the Museum can react in parallel to multiple solicitations by multiple contemporary users.

THE MUSEUM OF MONTICCHIELLO

The previously described system is fully integrated into the MuseumNet© architecture as a specialized module for Museum reactivity. This architecture is currently used inside the Museum of Monticchiello (a little village nearby Siena, Italy) devoted to Old Theatrical Arts. The Museum hosts two different MSIs; one is Pressure based and the other is Vision based. At the entrance, visitors go inside a darkroom where some ancient objects, sounds and videos are presented. Special illumination effects at the corners, allow visitors to progressively discover the environment and to see ancient things. At the same time projections over the walls and from the floor (some tiles are semitransparent) at different locations, are used to illustrate various aspects of the ancient theatrical art. It is important to note that the presentation does not follow a predetermined path, it is the Museum that reacts to the visitors depending on their movements detected through the array of pressure sensors. In this way each visit is a unique experience developing from the social behaviour of the current visitor group (see Figure 4).

Pressure based MSI during the design stage

Figure 4 Pressure based MSI during the design stage

From the dark room, visitors go into a tunnel that get them to a Vision based MSI. The room has four cameras at the ceiling corners that are continuously looking for visitors. At the centre of the room an ancient sink has been equipped with hidden interactive presentation devices. In particular, videos can be shown at the bottom of the well, while sounds can be played through multiple rings of speakers hidden all around the sink border. These devices are used to show images, descriptive narrations and videos, and to create 3D sound effects. The cameras are used to track the visitors position in real time so that the control system constantly knows their spatial distribution on the floor. The positional information is used to enable reactive events during the visit. For example, when a predetermined number of persons enter the room, a background appealing sound is enabled to attract their attention towards the sink that is in the middle of the room. When a suitable number of persons are nearby the sink (the MSI uses vision to verify that there are people enough), video presentations start and are projected on the well bottom. The video presentations depend on the number and on the distribution of visitors around the sink (that is the Museum reacts in different way depending on people behaviour). If another group of visitors concentrates in another position while the remaining part of them is around the sink, the Museum reacts by starting a video presentation over the nearby wall, while the presentation at the sink continues (see Figure 5). If after some time the persons around the sink still remain there, the Museum reacts by stopping video on the well bottom and starts a sound while opening a little hidden vain on the opposite part of the room. Again the interactions are not predetermined, they basically depend on people behaviour; moreover multiple interactive presentations can go on in parallel targeted at different sub-groups of visitors.

The multimedia sink and a projection screen in the background

Figure 5 The multimedia sink and a projection screen in the background

CONCLUSIONS

In this paper we have presented an innovative module that has been integrated in the MuseumNet© architecture. The module enables the use of Multisensor Social Interfaces inside Museums. Social Interfaces are new interaction metaphors that allow multiple visitors to interact in parallel with multiple computers; moreover, the interactions can be autonomously initiated by the system so enabling interaction mechanisms completely different from traditional approaches. The system has been practically implemented at the Museum of Monticchiello, nearby Siena, devoted to Ancient Theatrical Arts. In the next future, a support subsystem for emotional agents will be added to the Proposed architecture. The MSIs will have their artificial psychology. This fact will add new dimensions to the interaction activities by enabling "motivational behaviours" based on the MSIs internal state.

Bibliography

[1] A.P. Pentland, "Smart Rooms: Machine Understanding of Human Behaviour", in Computer Vision for Human-Machine Interaction, R. Cipolla, A. Pentland Eds., Cambridge University Press, 1998
[2] A. Mecocci, "MuseumNet: 3D Virtual Environments and Wireless Appliances for improved Cultural experiences", EVA 2001 Conference, Florence. Electronic Imaging & the Visual Arts. Conference Proceedings. Pp.137-142, Pitagora Editrice, Bologna, 2001
[3] Sowizral, Rushforth, Deering, "Java 3D API Specification, Second Edition", Addison Wesley, May 26, 2000
[4] F. Moschetti, A. Mecocci, M. Sorci, "Motion Estimation for Digital Video Coding Based on a Statistical Approach", IEEE-ICICS2001 - Information, Communications & Signal Processing, Singapore, 2001

Copyright Minerva Project 2003-11, last revision 2003-11-12, edited by Minerva Editorial Board.
URL: www.minervaeurope.org/events/parma/papers/mecocci.htm