Introduction to the Semantic Web and RDF

ZPUG DC
December 2, 2004

A.M. Kuchling
www.amk.ca
amk @ amk.ca

The Semantic Web has been a W3C project since around 1999.

The existing Web of HTML documents is good for humans:

The Semantic Web will augment the existing human-readable Web with structured data that's easy for software to process.

An Amazon page

[Screen capture of an Amazon page]

Layers of the Semantic Web

The Semantic Web is split into three layers:

Web Ontology Language (OWL)
Relationships between vocabularies
  • "Persons" in vocabulary A are the same thing as "Users" in vocabulary B.
  • Resource X and resource Y are referring to the same thing.
RDF Schema:
Vocabulary definitions
  • There is a class called "Person".
  • Resource X is an instance of "Person".
Resource Description Framework (RDF)
Assertions of facts
Resource X is named "Drew".

Overview of RDF

RDF is a specification that defines a model for representing the world, and a syntax for serializing and exchanging the model.

Facts are 3-tuples of (subject, property, object).

Subject has a property of object
Resource X has a name of "Drew"
ISBN 1234567890 has an author of resource X
Resource X has a type of Person

Noteworthy RDF vocabularies

Dublin Core

FOAF (Friend-of-a-friend)

DOAP (Description of a Project)

An Example RDF Graph

[Example RDF graph]

RDF Graph: Resources

Resources are identified by URIs

[Example RDF graph highlighting resources]

RDF Graph: Literals

[Example RDF graph highlighting literals]

RDF Graph: Properties

[Example RDF graph highlighting property arcs]

RDF Graph: Property URIs

How are properties identified? They could be just names or serial numbers, but that wouldn't be very scalable.

Instead, properties have URIs just like resources.

RDF statements and triples

Graphs are usually represented as a bunch of (subject,property,object) 3-tuples.

Subject Property Object
http://example.com/rev1 rev:subject →
http://amk.ca/xml/review/1.0#subject
urn:isbn:1930110111
urn:isbn:1930110111 dc:title →
http://purl.org/dc/elements/1.1/title
"XSLT Quickly"
urn:isbn:1930110111 dc:creator →
http://purl.org/dc/elements/1.1/creator
http://example.com/author/0042
http://example.com/author/0042 FOAF:surname →
http://xmlns.com/foaf/0.1/surname
DuCharme
http://example.com/author/0042 FOAF:homepage →
http://xmlns.com/foaf/0.1/homepage
http://www.snee.com/bob/
http://example.com/author/0042 FOAF:pastProject →
http://xmlns.com/foaf/0.1/pastProject
urn:isbn:1930110111

RDF syntaxes: RDF/XML

RDF Core defines an XML-based serialization for RDF.

<rdf:RDF 
    xmlns:FOAF="http://xmlns.com/foaf/0.1/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rev="http://amk.ca/xml/review/1.0#">

    <!-- Implies rdf:type property is rev:Review -->
    <rev:Review rdf:about="http://example.com/rev1">
        <rev:subject rdf:resource="urn:isbn:1930110111"/>
    </rev:Review>

    <rdf:Description rdf:about="http://example.com/author/0042">
        <FOAF:firstName>Bob</FOAF:firstName>
        <FOAF:homepage rdf:resource="http://www.snee.com/bob/"/>
        <FOAF:pastProject rdf:resource="urn:isbn:1930110111"/>
        <FOAF:surname>DuCharme</FOAF:surname>
    </rdf:Description>
</rdf:RDF>

RDF syntaxes: Notation-3 (or N3)

An informal syntax that's easier to read and easier to scribble.

@prefix rev: <http://amk.ca/xml/review/1.0#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix FOAF: <http://xmlns.com/foaf/0.1/> .

<http://example.com/author/0042> 
    FOAF:firstName "Bob";
    FOAF:surname "DuCharme";
    FOAF:homepage <http://www.snee.com/bob/>;
    FOAF:pastProject <urn:isbn:1930110111> .

<http://example.com/rev1> rev:subject [
   = <urn:isbn:1930110111>;
   dc:title "XSLT Quickly";
   dc:creator <http://example.com/author/0042>;
   dc:publisher "Manning" ] .

RDF's Sins and Virtues

Virtues:

Sins:

Available RDF Software

The most basic form of RDF software is simply an RDF parser. Parsers are available for most of the languages you might need:

Example code: Initializing an RDF database

Here's a Python example using rdflib 2.0.4 (www.rdflib.net).

#
# Initial setup -- create a TripleStore to hold RDF data
#

from rdflib.TripleStore import TripleStore
store = TripleStore()

You can add the contents of several URLs, parsing the data as RDF/XML:

store.load('http://www.amk.ca/amk.rdf')
store.load('http://www.python.org/pypi/?project=Twisted?format=doap')
store.load(...)

You can output the contents of a store:

print store.serialize(format='xml')

Example code: Modifying the database

You can add triples to a store:

from rdflib.URIRef import URIRef
from rdflib.Literal import Literal
from rdflib.Namespace import Namespace

REVIEW_NS = Namespace('http://amk.ca/xml/review/1.0#')
REVIEW_SUBJECT = REVIEW_NS['subject']
# Equivalent to:
##REVIEW_SUBJECT = URIRef('http://amk.ca/xml/review/1.0#subject')

book_uri = URIRef('urn:isbn:0609602330')

t = (URIRef('http://www.amk.ca/books/h/Isaacs_Storm.html'), 
     REVIEW_SUBJECT, book_uri)
store.add(t)

You can also remove a triple:

store.remove(t)

Example code: Querying the database

The most general query method is triples(), which takes a (subject, property, object) 3-tuple, returning an iterator over the matching triples.

For example, to list all things which have a dc:title property:

>>> DC_TITLE = DC_NS['title']
>>> for s,p,o in store.triples((None, DC_TITLE, None)): 
...     print s,p,o
...
urn:isbn:0609602330 http://purl.org/dc/elements/1.1/title \
   Isaac's Storm
urn:isbn:1930110111 http://purl.org/dc/elements/1.1/title \
   XSLT Quickly
>>>

Someday, there will be a query language (SPARQL example):

SELECT ?title
WHERE (<urn:isbn:1930110111> dc:title ?title)

RDF Schema

Lets us define vocabularies (sets of classes and/or properties).

Example vocabulary:

RDF Schema: Using rdf:type

[Graph showing rdf:type property values]

RDF Schema: Describing a resource

First, define a prefix for the schema's namespace URI:

@prefix rev: http://amk.ca/xml/review/1.0#

To declare that a particular resource is a rev:Review, assert that the resource's rdf:type property is the class:

# Declare a resource    
<http://example.com/review1> rdf:type rev:Review .

Describe what this resource is reviewing; what's the subject?

# Supply subject
<http://example.com/review1> rev:subject 
      <http://www.music.com/album/6542>.
<http://example.com/review1> rev:subject <urn:isbn:1930110111>.

RDF Schema: Classes

So how do we define the class?

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rev: <http://amk.ca/xml/review/1.0#> .

# Declare a Review class

rev:Review  # class URI: http://amk.ca/xml/review/1.0#Review
   rdf:type rdfs:Class ;
   rdf:ID "Review" ; 
   rdfs:comment """Reviews are resources that express an opinion 
about some other resource.""" ;
.

# Declare a subclass of Review.
rev:ComparativeReview
   rdf:type rdfs:Class ;
   rdfs:subClassOf rev:Review .
   rdfs:comment """Comparative reviews examine multiple resources,
comparing their relative merits and usually offering an opinion
about which one is the best.""" ;

rdfs:Property

You can also specify properties in a vocabulary. The following fragment defines the rev:subject property:

rev:subject 
    rdf:type rdf:Property;
    rdfs:label "Subject property" ;
    # Resources which can have this property
    rdfs:domain rev:Review ;    
    # Values this property can take
    rdfs:range rdfs:Resource ;  
    rdfs:comment "Value is the resource being reviewed." ;
    .

rev:title
    rdf:type rdf:Property;
    # This property only takes literal values
    rdfs:range rdfs:Literal;   
    .

OWL: Connecting vocabularies

With RDF Schema, we know:

  • Which classes exist,
  • What their properties are.

We don't know:

OWL: Web Ontology Language

OWL is a W3C language for defining this sort of relationship. Possible relationships:

OWL: Defining a class

Here's an OWL declaration of a class representing persons:

@prefix gen: <http://genealogy.example.com/schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

gen:Person 
    rdf:type owl:Class;
    rdf:ID "person" ;
    rdfs:comment "Resource representing a person." ;
    owl:equivalentClass foaf:Person;
    .

OWL: Property declaration

Define a property:

gen:ancestor 
    # Declare as a transitive property:
    # X -> Y, Y -> Z implies X -> Z
    rdf:type owl:TransitiveProperty;  
    rdfs:domain gen:Person;
    rdfs:range gen:Person;

    # Declare as inverse of some other property
    owl:inverseOf gen:descendant;
    .

Web ontology: What's the point?

OWL adds the ability to indicate when two classes or properties are identical.

OWL declarations provide additional information to let rule-checking and theorem-proving systems work with RDF data.

What should you care about?

So how much of this stuff do you need to learn about and use?

Why hasn't RDF caught on yet?

Starting Small

But we don't need to aim for the stars. Simple things can be done without much effort, and can still be useful:

There are signs of life: FOAF has caught on, DOAP is rising, and many small projects are using RDF internally.

Demos

Questions, comments?

These slides: www.amk.ca/talks/2004-12-02

For further information:

What Python library to use?

PyCon reminder

PyCon will be March 23-25 at GWU's Cafritz Center.

Deadline for proposals: Dec. 31st.

Call for papers:
http://www.python.org/pycon/2005/cfp.html

Proposal submissions: http://submit.pycon.org