Q: A model for topic maps

Unifying RDF and topic maps

Abstract

The paper describes a formal model for topic maps called Q, which can represent both topic maps and RDF. This is shown, and it is also shown how Q can be used to define a common core query language for both RDF and topic maps.

Introduction

This paper proposes a model for topic maps called the Q model that is simple and formal, can be used to efficiently implement topic map engines, and also supports RDF. The Q model can informally be thought of as RDF triples extended with a fourth element representing the identity of the triple. This enables the model to efficiently represent statements about other statements, which is the key to being able to represent topic maps compactly.

[Make it clear that Q doesn't model topic maps, but that it can be used to do so.]

[This is an extremely rough draft. I expected to submit only an abstract, and was dismayed to discover that full papers were wanted. This was written up very quickly to show what the paper is intended to look like. The model still needs more work, and the presentation of it even more. (Note: the PDF version of the Extreme stylesheets loses some of the math. The HTML version is fine.)]

Related work

TMRM. The Tau model of topic maps (Barta) and Dmitry Bogachev's BAM. Graham Moore's unpublished work on quads.
http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-tmgraph/ Also RDF community work on quads.

The Q model

Let A be the set of all atoms, where an atom is an object that has no other properties than being distinct from all other atoms. Let L be the set of all literals, where a literal is a string, like "hello" or "Topic Maps". Let I be the set of all identifiers, where I = A ∪ L. A model is a subset of the set (A x A x A x I).

The meaning of a quad can be thought of as

(subject, predicate, identity, object)

which is highly similar to RDF triples. The difference is that the third element can be used in the first position to efficiently make statements about another statement. This, as will be seen, is the key to being able to efficiently represent topic maps.

A valid model M further meets the following constraints:

  • No pair of quads (a, b, c, d) and (w, x, y, z) in M can exist such that (a ≠ w or b ≠ x or d ≠ z) and c = y. (Or, more informally, the third element in each quad must be unique within the model.)
  • No pair of quads (a, b, c, d) and (w, x, y, z) in M can exist such that a = w and b = x and d = z and c ≠ y. (Or, more informally, the same quad cannot appear twice with different identifiers.)
  • No pair of quads (a, b, c, d) and (w, x, y, z) in M can exist such that b = y. (Or, more informally, the identity of a quad cannot be used as a predicate.)

Operations on Q

In order for Q to really be useful we need to define a set of operations on it that allow us to describe Q transformations, as well as querying on Q.

The subscript operator, written as a [n] postfix, extracts the nth element in a quad. This allows us to define the following functions:

subj(q) = q[1]
pred(q) = q[2]
id(q) = q[3]
val(q) = q[4]

There is a function ɸ defined as follows:

ɸ(M, w, x, y, z) =
  {q ∈ M | subj(q)=3Dw and pred(q)=3Dx and id(q)=3Dy and val(q)=3Dz}

The parameter * is a wildcard that matches any identifier. [Define the wildcard properly.]

There is a function σ defined as:

σ(M, n) = {z ∈ A | ∃q ∈ M | q[n]=3Dz}

The function φ is defined as:

φ(M, s, P) = {∀q ∈ M | subj(q) = s and pred(q) ∈ P}

Merging

For Q to be a full model of topic maps it is necessary for it to also define merging. The function

r(v, i1, i2) = iff v:i1: i2 else v
can be used to define another function
t(q, i1, i2) = (r(q[1], i1, i2),
                r(q[2], i1, i2),
                r(q[3], i1, i2),
                r(q[4], i1, i2))
This enables us to define a merging function which takes a model and two identifiers and returns a new model in which the two identifiers have been merged, as follows:
m(M, i1, i2) = {∀z | ∃q ∈ M | t(q, i1, i2) = z}

Q and topic maps

Q can represent topic maps if and only if it is possible to roundtrip information between TMDM instance and Q without loss of information. This section will show how this is done [and may also contain notes on efficient implementation]. Throughout this section identifiers written in UPPER_CASE are used to denote atoms from a Q vocabulary used to represent TMDM. The symbol * is used to denote a quad identifier which will not be referenced further.

Transforming TMDM to Q

To transform a TMDM instance to Q, start with the model:

{(TOPIC_NAME,
META_TYPE, *, TOPIC_NAME),
(OCCURRENCE, META_TYPE, *, OCURRENCE)}
This will simplify returning from Q to TMDM, as will be seen later.

Then let the topic map item be tm, and

  • if there is a value l in tm.[base locator], add the quad (tm, BASE_LOCATOR, *, l),
  • for each value l in tm.[source locators] add the quad (tm, SOURCE_LOCATOR, *, l),
  • for each value t in tm.[topics], follow the procedure for topic items below, and
  • for each value a in tm.[associations], follow the procedure for association items below.

The procedure for any topic item t is:

  • add the quad (tm, TOPIC, *, t),
  • for each value n in tm.[topic names] add the quad (t, p, n, n.[value]), where p is the value n.[type] if n has a type and TOPIC_NAME if not, then follow the procedure for topic name items below,
  • for each value o in tm.[occurrences] add the quad (t, p, o, o.[value]), where p is the value o.[type] if o has a type and OCCURRENCE if not, then follow the procedure for occurrence items below,
  • for each value l in tm.[subject identifiers] add the quad (t, SUBJECT_IDENTIFIER, *, l),
  • for each value l in tm.[subject locators] add the quad (t, SUBJECT_LOCATOR, *, l), and
  • for each value l in tm.[source locators] add the quad (t, SOURCE_LOCATOR, *, l).

The procedure for any topic name item n is:

  • if n.[type] has a value t, add the quad (t, META_TYPE, *, TOPIC_NAME),
  • for each value t in n.[scope] add the quad (n, SCOPE, *, t),
  • for each value v in n.[variants] add the quad (n, VARIANT, v, v.[value]), then follow the procedure for variant items below, and
  • for each value l in n.[source locators] add the quad (n, SOURCE_LOCATOR, *, l).

The procedure for any variant item v is:

  • for each value t in v.[scope] add the quad (v, SCOPE, *, t),
  • for each value l in v.[source locators] add the quad (v, SOURCE_LOCATOR, *, l).

The procedure for any occurrence item o is:

  • if o.[type] has a value t, add the quad (t, META_TYPE, *, OCCURRENCE),
  • for each value t in o.[scope] add the quad (o, SCOPE, *, t),
  • for each value l in o.[source locators] add the quad (o, SOURCE_LOCATOR, *, l).

The procedure for any association item a is:

  • add the quad (tm, ASSOCIATION, *, a),
  • if there is a value t in a.[type] add the quad (a, TYPE, *, t),
  • for each value t in a.[scope] add the quad (a, SCOPE, *, t),
  • for each value r in a.[roles] add the quad (a, p, r, t), where p is the value of r.[type] if there is one or ASSOCIATION_ROLE otherwise, and t is the value of r.[player], then follow the procedure for association roles below, and
  • for each value l in a.[source locators] add the quad (a, SOURCE_LOCATOR, *, l).

The procedure for any association role item r is:

  • for each value r in r.[source locators] add the quad (r, SOURCE_LOCATOR, *, l).

[Issues: handling reification and datatyping. Both can be done easily and non-elegantly. Holding out for the elegant solution on both.]

At this stage we have reproduced the TMDM instance in Q, but where topic items reified other items in the TMDM instance the connection is only indirect in Q. [This can be solved by finding reification pairs, then merging them. However, this may cause difficulties on the return trip. It certainly causes difficulties for the reifies() predicate in tolog.]

Transforming Q to TMDM

For transforming a Q model to TMDM to be possible the Q model needs to use the Q vocabulary for representing TMDM defined in the previous section. In the following we will assume this without further comment; it is noteworthy, however, that the transformation is robust in the presence of additional non-TMDM information.

To transform a Q model M to TMDM, the first step is to do a little analysis on the model in order to know which predicates are name predicates, and which are occurrence predicates. This is done as follows:

N = σ(ɸ(M, *, META_TYPE, *, TOPIC_NAME), 1)
O = σ(ɸ(M, *, META_TYPE, *, OCCURRENCE), 1)

Topic map item

Now we can find the identity of the topic map. This is done with

tm = σ(ɸ(M, *, TOPIC, *, *), 1)
We can then construct the topic map item as follows:


Figure 1
tm.[base locator] = σ(ɸ(M, tm, BASE_LOCATOR, *, *), 4)
tm.[source locators] = σ(ɸ(M, tm, SOURCE_LOCATOR, *, *), 4)
tm.[topics] = σ(ɸ(M, tm, TOPIC, *, *), 4)
tm.[associations] = σ(ɸ(M, tm, ASSOCIATION, *, *), 4)

Topic items

For each element t in

σ(ɸ(M, *, TOPIC, *, *), 4)
create a topic item as follows:


Figure 2
t.[topic names] = σ(φ(M, t, N), 4)
t.[occurrences] = σ(φ(M, t, O), 4)
t.[subject identifiers] = σ(ɸ(M, t, SUBJECT_IDENTIFIER, *,
*), 4)
t.[subject locators] = σ(ɸ(M, t, SUBJECT_LOCATOR, *, *), 4)
t.[source locators] = σ(ɸ(M, t, SOURCE_LOCATOR, *, *), 4)
t.[parent] = tm

Topic name items

For each element n in

σ(φ(M, t, N), 4)
create a topic name item as follows:


Figure 3
n.[value] = σ(ɸ(M, t, *, n, *), 4)
n.[type] = σ(ɸ(M, t, *, n, *), 2)
n.[scope] = σ(ɸ(M, n, SCOPE, *, *), 4)
n.[variants] = σ(ɸ(M, n, VARIANT, *, *), 4)
n.[source locators] = σ(ɸ(M, n, SOURCE_LOCATOR, *, *), 4)
n.[parent] = t

Variant items

For each element v in

σ(ɸ(M, n, VARIANT, *, *), 4)
create a variant item as follows:


Figure 4
v.[value] = σ(ɸ(M, n, *, v, *), 4)
v.[scope] = σ(ɸ(M, v, SCOPE, *, *), 4)
v.[source locators] = σ(ɸ(M, v, SOURCE_LOCATOR, *, *), 4)
v.[parent] = n

Occurrence items

For each element o in

σ(φ(M, t, O), 4)
create an occurrence item as follows:


Figure 5
o.[value] = σ(ɸ(M, t, *, o, *), 4)
o.[type] = σ(ɸ(M, t, *, o, *), 2)
o.[scope] = σ(ɸ(M, o, SCOPE, *, *), 4)
o.[source locators] = σ(ɸ(M, o, SOURCE_LOCATOR, *, *), 4)
o.[parent] = t

Association items

For each element a in

σ(ɸ(M, *, ASSOCIATION, *, *), 4)
create an association item as follows:


Figure 6
M' = ɸ(M, a, *, *, *) - φ(M, a, {SOURCE_LOCATOR, TYPE, SCOPE})

a.[type] = σ(ɸ(M, a, TYPE, *, *), 4)
a.[scope] = σ(ɸ(M, a, SCOPE, *, *), 4)
a.[roles] = σ(M', 3)
a.[source locators] = σ(ɸ(M, a, SOURCE_LOCATOR, *, *), 4)
a.[parent] = tm

Assocation role items

For each (a, p, r, t) in M' create an association role item as follows:


Figure 7
r.[player] = t
r.[type] = p (unless p is ASSOCIATION_ROLE, in which case it is null)
r.[source locators] = σ(ɸ(M, r, SOURCE_LOCATOR, *, *), 4)
r.[parent] = a

Implementing Q efficiently

[Start by observing how many quads go into representing some example topic maps. Then show how it is possible to lose the TOPIC and ASSOCIATION quads, which considerably reduces the quad count. Then show how to compress binary associations, after which we have a really efficient representation. Use RDFTM test cases to show that efficience is same as for quad-based RDF implementation.]

[Discuss implementation strategies, both in-memory and with custom persistent backend. Key issue: keeping identifiers stable in the implementation. Show how registering event listeners is easy with this model. Show how user-defined indexes can be used to tune performance.]

Q and RDF

One might think that representing RDF in Q is straightforward, but there is actually a difficulty: URIs are identifiers in RDF, but literals in Q. This means that the most obvious representation is not available. However, representation is still quite straightforward.

To construct a Q model from an RDF model, add one quad as follows for each URI in the RDF model which appears in one of the first two positions of a triple: (*, RDF_URI, *, uri). We then define the following function:

i(M, u) = σ(ɸ(M, *, RDF_URI, *, u), 1)

Then, for each triple (s, p, o) in the RDF model where s is a URI, add the following quad (i(s), s(p), *, z) where z = o if o is a literal or a URI about which nothing is asserted and otherwise z=3Di(o).

[Transform M so that reification of RDF statements is represented directly.]

[Well, this wasn't exactly elegant. We need to adjust the Q representation of TMDM so that only a slight annotation of the RDF representation is necessary in order for an RDF representation to also become a topic map. Also need to ensure that Q representations of TMDM can be seen as reasonable RDF.]

[Finally describe an implementation approach that avoids having all the RDF_URI quads hanging around.]

[Relate this to the discussion of quads in RDF, and show which camp this approach belongs to. Also show how named graphs and context are representable using this approach.]

Q and query languages

The Q model can be used to define the semantics of the main query language for topic maps (tolog<span style="background-color:white">1) and also that for RDF (SPARQL). Q also makes it possible to create a single implementation of both languages.

Defining tolog on Q

tolog[Garshol04a] is a topic maps query language inspired by Prolog, which has much in common with Datalog. In one sense, tolog could be described as consisting of two parts: a core language very similar to Datalog with SQL parts added on, and a predicate library for accessing the fine structure of the topic map being queried[Garshol04b].

Core language

[Define result set as tabular structure where each column is a variable and each row is a match. Define predicate application (which is the same as predicate chaining). OR is simply two predicate chains whose results are unioned. NOT is defined as a filter. Optional clauses can be represented with OR and NOT. Inference rules are just an example of predicate application. SELECT and ORDER each require an operation. LIMIT and OFFSET, too. And that's it.]

The predicate library

The predicate library can easily be defined using tolog inference rules given the predicate _q($W, $X, $Y, $Z), which accesses the Q model. The predicate _q is of course not available in the syntax itself.


Figure 8: Defining the built-in predicates

association($A) :- _q($TM, ASSOCIATION, $ID, $A).

association-role($A, $R) :-
  association($A),
  _q($A, $P, $ID, $R),
  $P /= SOURCE_LOCATOR, $P /= TYPE, $P /= SCOPE .

base-locator($LOC) :- _q($TM, BASE_LOCATOR, $ID, $LOC).

direct-instance-of($I, $T) :- uses associations, and so is easy.

instance-of($I, $T) :- uses associations, and so is easy.

occurrence($T, $O) :-
  _q($P, META_TYPE, $ID, OCCURRENCE),
  _q($T, $P, $ID2, $O).

reifies($RR, $RD) :- /* don't need Q for this */
  subject-identifier($RR, $LOC),
  source-locator($RD, $LOC).

resource($O, $L) :- /* requires datatyping to get right */
  { _q($N, VARIANT, $O, $L) |
    _q($P, META_TYPE, $ID, OCCURRENCE),
    _q($T, $P, $O, $L) }.

role-player($R, $T) :-
  association-role($A, $R), _q($A, $P, $R, $T).

scope($O, $T) :- _q($O, SCOPE, $ID, $T).

source-locator($O, $L) :- _q($O, SOURCE_LOCATOR, $ID, $L).

subject-identifier($T, $L) :- _q($T, SUBJECT_IDENTIFIER, $ID, $L).

subject-locator($T, $L) :- _q($T, SUBJECT_LOCATOR, $ID, $L).

topic($T) :- _q($M, TOPIC, $ID, $T).

topic-name($T, $N) :-
  _q($P, META_TYPE, $ID, TOPIC_NAME),
  _q($T, $P, $ID2, $N).

topicmap($TM) :- _q($TM, TOPIC, $ID, $T).

type($O, $T) :- {
  /* association */
  _q($O, TYPE, $ID, $T) |

  /* association role */
  association-role($A, $O), _q($A, $T, $O, $TOPIC) |

  /* occurrence */
  _q($P, META_TYPE, $ID2, OCCURRENCE), _q($TOPIC, $T, $ID2, $O) |

  /* topic name */
  _q($P, META_TYPE, $ID, TOPIC_NAME),  _q($TOPIC, $T, $ID2, $O)
}

value($O, $V) :- ...

value-like($O, $V) :- requires fulltext predicate.

variant($N, $V) :- _q($N, VARIANT, $V, $L).

Defining SPARQL on Q

[tolog and SPARQL result sets are the same. SPARQL triple patterns become applications of the predicate _q. OPTIONAL is the same as optional clauses in tolog. UNION is the same as OR in tolog. GRAPH requires an extension to the Q representation of RDF, but one that is discussed in the implementation section. SELECT, SELECT DISTINCT, SELECT LIMIT. Also look into core operators in SPARQL to see if something needs to be said about those.]

Q and constraint languages

[This section will be rather short, but will basically point out that as far as TMCL is concerned, Q is just a convenient way to define its semantics. RDFS and OWL get into trouble with the reification, however. Discuss implications of this.]

Conclusion

[Good stuff goes here.]

Notes

<table width="100%" cellpadding="8" cellspacing="0" border="0"> 1.

This is expected to also be true for TMQL, but the specification of that language has not yet progressed to the point where this can be shown. This paper therefore uses tolog as its example instead.


Acknowledgments

The author is indebted to Robert Barta and Dmitry Bogachev for constructive criticism of the model presented herein. Geir Ove Grønmo reviewed an early draft of the paper and caught many errors.


Bibliography

[Garshol04a] tolog—Language tutorial, Lars Marius Garshol, Ontopia AS, Norway, 2004-12-08.
http://www.ontopia.net/omnigator/docs/query/tutorial.html

[Garshol04b] The Built-in tolog Predicates—Reference Documentation, Lars Marius Garshol, Ontopia AS, Norway, 2004-09-08.
http://www.ontopia.net/topicmaps/materials/tolog-predicate-reference.html

Revision: r1.2 - 19 May 2005 - 13:17 - StevePepper
RDFTM > PaperOnQ
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Fabio's Wiki? Send feedback