Product Information:-

  • Journals
  • Books
  • Case Studies
  • Regional information

The Semantic Web – a new tool for libraries?

Options:     PDF Version - The Semantic Web – a new tool for libraries?  Print view

The building blocks of the Semantic Web

The World Wide Web Consortium (W3C), the international community that develops common protocols to ensure the long-term growth of the Web, is working on standards to build a "technology stack" to support the Semantic Web. These standards have a common aim: to create a uniform way of accessing heterogeneous data sources.

The main principles have been summarized by Burke (2009) as follows:

  • Metadata, in other words, resource description format (RDF) technologies which identify and exploit relationships between items.
  • Ontologies, which provide vocabularies for the description of properties and classes.

Image: Figure 2. The Semantic Web Layer Cake.

Figure 2. The Semantic Web Layer Cake (Berners-Lee, 1999; Swartz-Hendler, 2001, from Hall and Shadbolt, 2009)

Uniform resource identifiers

Whereas URLs refer to location, uniform resource identifiers (URIs) refer to objects. That object may be an information resource, a real world entity, a person, a term in a vocabulary, or even a phrase denoting a relationship, for example "is a". This URI needs to be capable of being "de-referenced", in other words, we should be able to get something back.

Linked data and RDF

Linked data are the basis of the Semantic Web. However, these data are available in many different formats – for example, relational, XML, HTML. For these data to be searchable and manageable they need to be in a standard format – which is what RDF does.

Based on ideas from artificial intelligence, RDF provides additional metainformation; it is also a way of decomposing knowledge into its constituent parts.

There are three standards concepts in RDF: resources, properties and statements. All RDF statements are represented as triples, with a subject, predicate and object, and each part of a triple is represented by a URI.

The following example in Table I is based on the statement: "An apple is a fruit" (Krötzsch, 2008):

Table I. Example of the RDF statement: "An apple is a fruit"
Construct RDF-Type Part of the sentence
Resource rdf:subject an apple
Property rdf:predicate is a
Resource rdf:object fruit

The code would look like this:

<rdf:RDF>
<rdf:Statement>
<rdf:subject rdf:resource"Apple" />
<rdf:predicate rdf:resource="onto;is a" />
<rdf:object rdf:resource="Fruit" />
</rdf:Statement>
</rdf:RDF>

As of March 2009, there were 4.5 million triples on the Web (Hall and Shadbolt, 2009).

Image: Figure 3. Datasets on the Web as of March 2009 (Hall and Shadbolt, 2009).

Figure 3. Datasets on the Web as of March 2009 (Hall and Shadbolt, 2009)

W3C has developed a new syntax, RDFa, which is simpler and – herein lies its beauty – can be embedded in XHTML documents. Thus Web pages can be transformed, by a simple piece of script, into items that can be semantically searched and retrieved, without changing the way they are viewed in a web browser.

Whitehouse.gov is incorporating RDFa into its site, with property, rel. and xlms attributes to provide better structure (Peterson, 2008).

Ontologies

Ontologies, or vocabularies, are a form of taxonomy. An ontology has been described as:

"a schema that formally defines the hierarchies and relationships between different resources. Semantic Web ontologies consist of a taxonomy and a set of inference rules from which machines can make logical conclusions" (Altova, 2009).

In other words, they are a domain-specific shared vocabulary. They provide additional meaning to the data, and so make it more flexible. They can be used for integrating data, for example when new relationships may give rise to new knowledge.

In the field of health care, medical and pharmaceutical knowledge could be combined with patient data for epidemiological research, and information about treatment efficacy (W3C, 2009).

They can also help reduce ambiguity. For example anyone seeking information on the British prime minister would have to use those terms, as well as his name: ontologies can be developed which link the name with the function. Another example would be a bookseller or library trying to build a databases from lots of different publishers' datasets. The latter may use different terms for author, for example, creator or editor, and the ontology can clarify that these are variants.

Two of the main techniques to describe vocabulary terms in standard form are Web Ontology Language (OWL), which can add more vocabulary for describing properties and classes, and Simple Knowledge Organization System (SKOS). The latter is used to design knowledge organization systems and has clear applications to libraries.

Query languages

The Semantic Web needs its own query language, which relates to RDF just as SQL relates to XML. This language is known as SPARQL.

SPARQL, like RDF, is based on triples, with the exception that one or more references is a variable; results are returned which match the RDF triple.