vai al contenuto
 

Handbook on cultural web user interaction
First edition (September 2008)
edited by MINERVA EC Working Group “Quality, Accessibility and Usability”


4.5      Towards semantic integration

The web is not a collection of documents, it is a group of ‘places’ set in a virtual landscape. The ‘places’ on the web are points of entry for interaction between individuals and individuals, between individuals and organizations, or between organizations and organizations.

The interaction occurs through the exchange of information and documents and access to services.

The basic problems of the web therefore consist of finding the relevant places and in exploiting the available information and services.

Search engines (universal, OPACs or the search forms of cultural heritage web catalogues) are essential for surfing on the web but they rapidly become insufficient: sometimes we end up with too many results, most of them useless; sometimes we have no results; results depend on the vocabulary used (search by word or sequence of words).

To give an example, if we search for information on “marsupial”, the textual search engine identifies all the pages where the word “marsupial” (as it is written) is present, even if we would prefer a search engine that identifies a photo of the wolf of Tasmania (which is a marsupial) in a page where the word “marsupial” does not appear.

The ambiguity and subtleties of language (see also 2.7) must also be considered. For example, “net” means quite different things for a web designer or for a fisherman; a violinist is part of an orchestra and her/his fingers are part of her/him, but are these fingers part of the orchestra? If I say the “teaspoon is in the cup”, do I mean that it is resting in the concave part of the cup, or that it is included in the actual material of the cup?

For a human-interpreter statements are always disambiguated from the context, because we reason by deduction, but for a computer?

The problem is that the World Wide Web was originally constructed for being used just by humans and, even if everything in it can be read by machine (the automatic user see 2.4.1.6), this data cannot be understood by the latter. Because of the (increasing) quantity of information on the web it is not possible to manage it manually.

A possible solution could be creating metadata to describe the data contained on the web. We must remember that on the web the distinction between data and metadatais not absolute; sometimes the resource itself can be interpreted simultaneously in two ways and the metadata can describe other metadata. Almost always data and metadata are based on a specific syntax (logical structure) in order to avoid ambiguity.

A better solution could be to teach machines to disambiguate all the statements present on the web, for example conceiving and sharing “documents” that collects and express all the concepts that build our knowledge, the ontologies.

Ontology is a term taken on loan from philosophy referring to the science of description of the type of entities of the World and of how they are related one another. The ontologies seem to be the most efficient way to represent knowledge, unambiguous descriptions of the concepts in a certain domain plus a hierarchical description of the relations between concepts themselves plus the rules necessary to obtain additional knowledge. Often ontologies are limited to specific domains of human knowledge, so that an entity assumes one meaning rather than another.

The overall solution to encode, exchange and re-use structured metadata, expressing data and representing data rules, exporting all that knowledge and making it sharable and available for any application is called “semantic web”.

4.5.1      The semantic web

Tim Berners-Lee, James Hendler and Ora Lassila defined the semantic web “A new form of web content that is meaningful to computers” (The semantic web, Scientific American, May 2001).

But what does it mean more precisely? It aims to permit the discovery of information and services, taking as granted that each resource is identified by a Uniform Resource Identifier (URI), using concepts rather than keywords and allowing the automation of services.

To describe the data contained on the web in a machine-readable way (in this sense researchers use the word “semantic”), models must be defined for representing knowledge. The semantic web includes a set of design principles, collaborative working groups, and a variety of enabling technologies, for the most part defined by the W3C Semantic Web Activity.

The main difficulties in implementing the Semantic Web are in the definition and the universal dissemination of standard formats for assuring the interoperability of applications and the implementation of deductive reasoning in a completely automatic manner, exporting on the web rules from any knowledge base.

But where are we now? Some elements of the semantic web are still to be implemented or realized. The semantic web hypothesized by Tim Berners-Lee probably cannot exist still for some time. However, websites, intranets, and extranets that provide information services are already numerous. Technologies based on descriptive logic are currently ready to represent knowledge in textual form and to provide a level of automatic reasoning services. It is therefore already possible to take the first steps towards a semantic web creating simple applications based on the descriptive logics that provide services for our websites.

The semantic web, a declarative environment, uses standards and tools based on XML Namespace and XML Schema2. This set of W3C standards provides an elemental syntax for content structure within documents, but does not directly associate semantics to the meanings of the content.

The W3C technology to encode, exchange and reuse structured web metadata is the Resource Description Framework(RDF). To express restrictions on the associations, to avoid the encoding of syntactical correct statements without any sense a mechanism to represent classes of objects is necessary, and was created the RDF Vocabulary Description Language, or RDF Schema (RDFS).

After having expressed data and data rules we need a language to export that knowledge (ontologies) and to make it available to any application: the W3C Web Ontology Language (OWL).

All these components are usually organized in the so called “Semantic web stack”: above XML (useful to give a structure to resources) and RDF (to express meanings, or better to define that some elements have some properties), we find the ontological level, the area  where to define formally the relations between terms. The upper level is the logic level, where the assertions present on the web may be used to derive new knowledge, not using an unique, universal reasoning system but with a unifying logic to represent all trusted demonstrations.

Semantic web stack

W3C Semantic Web Activity
http://www.w3.org/2001/sw/

4.5.2      Resource Description Framework Data Model

Resource Description Framework(RDF) is a universal, basic framework for codifying, exchanging and reusing structured metadata. It supports interoperability between web applications that exchange machine-understandable information.

The RDF data model, that represents RDF statements in a syntactically neutral manner, is very simple and is based on three types of object: resources, properties and statements. The first two are univocally individuated by a URI:

Resources: Anything described by an RDF expression. A web page or part of one, or an XML element within the source document. But also an entire collection of web pages, or an object that is not directly accessible via the web.

Properties: A property is a specific aspect, a characteristic, an attribute, or a relation used for describing a resource. Every property has a specific meaning. It defines admissible values, the types of resource that it can describe, and its relations with other properties. The properties associated with a resource are identified by a name and have values.

Statements: A resource, with a property identified by a name, and a value of the property for a specific resource, forms an RDF statement. A statement is therefore a triple composed of a subject (resource), a predicate (property) and an object (value). The object of a statement (the property value) can be an expression (sequence of characters or some other primitive type defined by XML) or another resource.

A series of properties referred to the same resource is called description.

The RDF data model

For example, the statement that says that some information on the English writer William Shakespeare in Wikipedia can be found in a web resource titled ‘William Shakespeare’ can be expressed in RDF like this:

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1">
<rdf:Description rdf:about="http://en.wikipedia.org/wiki/William_Shakespeare">
<dc:publisher>Wikipedia</dc:publisher>
<dc:title>William_Shakespeare</dc:title>
</rdf:Description>
</rdf:RDF>

An interesting initiative was the encoding of basic DCMES (see 4.3) in XML using simple RDF, providing a DTD and W3C XML Schemas.

The primary goal for this work was to “provide a simple encoding, where there are no extra elements, qualifiers, optional or varying parts allowed, with some restrictions.

One result of the restrictions is that the encoding does not create documents that can be embedded in HTML pages. Encodings for qualified DC however were created, like for example Expressing Qualified Dublin Core in RDF / XML.

Resource Description Framework (RDF):
Concepts and Abstract Syntax, W3C Recommendation, 2004
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/

Expressing Simple Dublin Core in RDF/XML
http://dublincore.org/documents/dcmes-xml/

Qualified DC in RDF/XML
http://dublincore.org/documents/dcq-rdf-xml/

4.5.3      RDF Vocabulary Description Language, or RDF Schema (RDFS)

RDF Schema is a vocabulary for describing properties and classes of RDF-based resources, with semantics for generalized hierarchies of such properties and classes.

It provides basic elements for the description of ontologies (RDF vocabularies), intended to structure RDF resources.

RDF properties may be thought of as attributes of resources and in this sense correspond to traditional attribute-value pairs. RDF properties also represent relationships between resources.

RDF, however, provides no mechanisms for describing these properties, nor does it provide any mechanisms for describing the relationships between these properties and other resources. That is the role of the RDF Vocabulary Description Language, or RDF Schema. RDF Schema defines classes and properties that may be used to describe classes, properties and other resources.

The RDF vocabulary description language class and property system is similar to the type systems of object-oriented programming languages such as Java.

RDF Vocabulary Description Language 1.0: RDF Schema
http://www.w3.org/TR/rdf-schema/
(see in particular the Introduction to the document)

4.5.4      Representing thesauri in RDF: SKOS

The Simple Knowledge Organisation Systems or SKOS is a W3C area of work developing specifications and standards to support the use of knowledge organisation systems (KOS) such as thesauri, classification schemes, subject heading systems and taxonomies within the framework of the Semantic Web. SKOS is built upon RDF and RDFS, and its main objective is to enable easy publication of controlled structured vocabularies for the semantic web.

SKOS, frequently adopted in the cultural heritage field, is currently a work in progress, and the main published documents – the SKOS Core Guide, the SKOS Core Vocabulary Specification, and the Quick Guide to Publishing a Thesaurus on the semantic web – have W3C Working Draft status. The new Semantic Web Deployment Working Group chartered for two years (May 2006 - April 2008), has put in its charter to push SKOS forward on the W3C Recommendation track.

SKOS Simple Knowledge Organization System Primer
http://www.w3.org/TR/skos-primer
 

4.5.5      The Web Ontology Language (OWL)

The first level above RDF required for the Semantic Web is an ontology language what can formally describe the meaning of terminology used in web documents. If machines are expected to perform useful reasoning tasks on these documents, the language must go beyond the basic semantics of RDF Schema.

An ontology language is intended to be used when the information contained in documents needs to be processed by applications, as opposed to situations where the content only needs to be presented to humans. This language can be used to explicitly represent the meaning of terms in vocabularies and the relationships between those terms.

W3C Ontology Web Language has more facilities for expressing meaning and semantics than XML, RDF, and RDF S, and thus OWL goes beyond these languages in its ability to represent machine interpretable content on the web.

The W3C-endorsed OWL specification includes the definition of three increasingly expressive sublanguages designed for use by specific communities of implementers and users: OWL Lite, OWL DLand OWL Full. Each of these sublanguages is an extension of its simpler predecessor, both in what can be legally expressed and in what can be validly concluded.

OWL Litesupports those users primarily needing a classification hierarchy and simple constraints.

OWL DLsupports those users who want the maximum expressiveness while retaining computational completeness (all conclusions are guaranteed to be computable) and decidability (all computations will finish in finite time).

OWL Fullis meant for users who want maximum expressiveness and the syntactic freedom of RDF with no computational guarantees.

The semantic web and the systems based on descriptive logics, according to some voices, don’t seem to be our immediate future, due to the resistance of many communities to full interoperability, but numerous software tools are already available on the market (free, often):

                        •     for the use of RDF or OWL ontologies by software applications (e.g. Jena)
                        •     for the definition and update of RDF or OWL ontologies (e.g. Protégé)
                        •     for the automatic execution of deductive reasonings in OWL DL (e.g. Racer).

OWL Web Ontology Language Overview
http://www.w3.org/TR/owl-features/

Jena
http://www.jena.sourceforge.net

Protegé
http://protege.stanford.edu

Racer
http://www.racer-systems.com

4.5.6      Semantics for cultural heritage: CIDOC Conceptual Reference Model

How descriptive logics can be applied in the field of cultural heritage? Cultural heritage is a complex knowledge domain, with a great deal of ambiguous and transversal terminology. The CH sector is very rich in variety of possible associations, either between documents themselves and with documents pertaining to other disciplines.

The major initiative in this area is the CIDOC Conceptual Reference Model (CIDOC CRM), promoted by the International Committee for Documentation of ICOM (International Council of Museums) and now stable after a decade of work3. Since 2006 this has been the international standard (ISO 21127:2006) for the controlled exchange of cultural heritage information.

The CIDOC CRM is intended to promote a shared understanding of cultural heritage information by providing a common and extensible semantic framework to which any cultural heritage information can be mapped. It is intended to be a common language for domain experts and implementers to formulate requirements for information systems and to serve as a guide for good conceptual modelling practice.

CIDOC CRM is a core ontology which incorporates basic entities and relationships common across the diverse metadata vocabularies and might be useful for integrating information from heterogeneous vocabularies and uniform processing across heterogeneous information sources.

There is an important, even if subtle, difference between a core ontology and core metadata, such as Dublin Core. Even if both are intended for information integration, they differ in the relative importance of human understandability. Metadata are in general created, edited, and viewed by humans. In contrast, a core ontology is a underlying formal model for tools that integrate source data and perform a variety of extended functions.

In this approach metadata can be used not only to describe and to link to resources, but also to indicate where and why you can go from the resource itself4.

Cultural institutions are encouraged to use the CIDOC CRM to enhance accessibility to museum-related information and knowledge.

One of most interesting examples of semantic application in cultural heritage sector is the “Finnish Museums on the Semantic Web” (FMS) whose major goals are to make collection metadata, which stem from heterogeneous databases, semantically interoperable on the Web, and to provide facilities for semantic browsing and searching in the combined knowledge base of the participating museums5.

The CIDOC Conceptual Reference Model
http://cidoc.ics.forth.gr/

1 Usability.net, ISO 13407: Human centred design processes for interactive systems, <http://www.usabilitynet.org/tools/13407stds.htm>.
2 Lorenzo Cantoni, Nicoletta Di Blas, Davide Bolchini, Comunicazione, qualità, usabilità, Milano: Apogeo, 2003, p. 33.
3 Lorenzo Cantoni, Nicoletta Di Blas, Davide Bolchini, Comunicazione, qualità, usabilità, op. cit., p. 47.
4 For the text of this directive and other relevant legislation consult the site of the European Data Protection Supervisor, <http://www.edps.europa.eu/EDPSWEB/edps/lang/en/pid/17>.
5 <http://www.w3.org/P3P/ >
6 See http://www.multimatch.eu. The consortium, whose coordinator is Pasquale Savino savino@isti.cnr.it, is composed by Istituto di Scienza e Tecnologie dell'Informazione, University of Sheffield, Dublin City University, University of Amsterdam, University of Geneva, Universidad Nacional de Educación a Distancia, OCLC, WIND Telecomunicazioni S.p.A., Cultural Heritage, Fratelli Alinari Istituto Edizioni Artistiche SpA, Netherlands Institute for Sound and Vision, University of Alicante - Biblioteca Virtual Miguel de Cervantes.


2 See especially the informations on W3C Semantic Web Activity, http://www.w3.org/2001/sw/.

3 CIDOC version 4.2 was also encoded in RDFS by the ICS-FORTH (ISL-ICS) on 2005-2006

4 Oreste Signore, Ontology Driven Access to Museum Information, CIDOC 2005 Congress – Zagreb, http://www.cidoc2005.com/.

5 A short presentation of the project is provided in Eero Hyvönen et al., Cultural Semantic Interoperability on the Web: Case Finnish Museums Online, http://iswc2002.semanticweb.org/posters/hyvonen_a4.pdf.


cover of  handbook

 

About

Structure

Interoperability

Quality, accessibility, usability

Best Practices