Start of topic | Skip to actions

The paper describes a formal model for topic maps called Q, which can represent both topic maps and RDF. This is shown, and it is also shown how Q can be used to define a common core query language for both RDF and topic maps.

This paper proposes a model for topic maps called the Q model that is simple and formal, can be used to efficiently implement topic map engines, and also supports RDF. The Q model can informally be thought of as RDF triples extended with a fourth element representing the identity of the triple. This enables the model to efficiently represent statements about other statements, which is the key to being able to represent topic maps compactly.

*[Make it clear that Q doesn't model topic
maps, but that it can be used to do so.]*

*[This is an extremely rough draft. I expected
to submit only an abstract, and was dismayed to discover that full
papers were wanted. This was written up very quickly to show what the
paper is intended to look like. The model still needs more work, and
the presentation of it even more. (Note: the PDF version of the
Extreme stylesheets loses some of the math. The HTML version is
fine.)]*

*TMRM. The Tau model of topic maps (Barta) and
Dmitry Bogachev's BAM. Graham Moore's unpublished work on quads.
http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-tmgraph/
Also RDF community work on quads.*

Let A be the set of all atoms, where an atom is an object that has no other properties than being distinct from all other atoms. Let L be the set of all literals, where a literal is a string, like "hello" or "Topic Maps". Let I be the set of all identifiers, where I = A ∪ L. A model is a subset of the set (A x A x A x I).

The meaning of a quad can be thought of as

(subject, predicate, identity, object)

which is highly similar to RDF triples. The difference is that the third element can be used in the first position to efficiently make statements about another statement. This, as will be seen, is the key to being able to efficiently represent topic maps.

A valid model M further meets the following constraints:

- No pair of quads (a, b, c, d) and (w, x, y, z) in M can exist such that (a ≠ w or b ≠ x or d ≠ z) and c = y. (Or, more informally, the third element in each quad must be unique within the model.)
- No pair of quads (a, b, c, d) and (w, x, y, z) in M can exist such that a = w and b = x and d = z and c ≠ y. (Or, more informally, the same quad cannot appear twice with different identifiers.)
- No pair of quads (a, b, c, d) and (w, x, y, z) in M can exist such that b = y. (Or, more informally, the identity of a quad cannot be used as a predicate.)

In order for Q to really be useful we need to define a set of operations on it that allow us to describe Q transformations, as well as querying on Q.

The subscript operator, written as a [n] postfix, extracts the nth
element in a quad. This allows us to define the following functions:

subj(q) = q[1] pred(q) = q[2] id(q) = q[3] val(q) = q[4]

There is a function ɸ defined as follows:

ɸ(M, w, x, y, z) = {q ∈ M | subj(q)=3Dw and pred(q)=3Dx and id(q)=3Dy and val(q)=3Dz}

The parameter * is a wildcard that matches any identifier.
*[Define the wildcard properly.]*

There is a function σ defined as:

σ(M, n) = {z ∈ A | ∃q ∈ M | q[n]=3Dz}

The function φ is defined as:

φ(M, s, P) = {∀q ∈ M | subj(q) = s and pred(q) ∈ P}

For Q to be a full model of topic maps it is necessary for it to also
define merging. The function

r(v, i1, i2) = iff v:i1: i2 else vcan be used to define another function

t(q, i1, i2) = (r(q[1], i1, i2), r(q[2], i1, i2), r(q[3], i1, i2), r(q[4], i1, i2))This enables us to define a merging function which takes a model and two identifiers and returns a new model in which the two identifiers have been merged, as follows:

m(M, i1, i2) = {∀z | ∃q ∈ M | t(q, i1, i2) = z}

Q can represent topic maps if and only if it is possible to roundtrip
information between TMDM instance and Q without loss of information.
This section will show how this is done *[and
may also contain notes on efficient implementation]*.
Throughout this section identifiers written in UPPER_CASE are used to
denote atoms from a Q vocabulary used to represent TMDM. The symbol *
is used to denote a quad identifier which will not be referenced
further.

To transform a TMDM instance to Q, start with the model:

{(TOPIC_NAME, META_TYPE, *, TOPIC_NAME), (OCCURRENCE, META_TYPE, *, OCURRENCE)}This will simplify returning from Q to TMDM, as will be seen later.

Then let the topic map item be tm, and

- if there is a value l in tm.[base locator], add the quad (tm, BASE_LOCATOR, *, l),
- for each value l in tm.[source locators] add the quad (tm, SOURCE_LOCATOR, *, l),
- for each value t in tm.[topics], follow the procedure for topic items below, and
- for each value a in tm.[associations], follow the procedure for association items below.

The procedure for any topic item t is:

- add the quad (tm, TOPIC, *, t),
- for each value n in tm.[topic names] add the quad (t, p, n, n.[value]), where p is the value n.[type] if n has a type and TOPIC_NAME if not, then follow the procedure for topic name items below,
- for each value o in tm.[occurrences] add the quad (t, p, o, o.[value]), where p is the value o.[type] if o has a type and OCCURRENCE if not, then follow the procedure for occurrence items below,
- for each value l in tm.[subject identifiers] add the quad (t, SUBJECT_IDENTIFIER, *, l),
- for each value l in tm.[subject locators] add the quad (t, SUBJECT_LOCATOR, *, l), and
- for each value l in tm.[source locators] add the quad (t, SOURCE_LOCATOR, *, l).

The procedure for any topic name item n is:

- if n.[type] has a value t, add the quad (t, META_TYPE, *, TOPIC_NAME),
- for each value t in n.[scope] add the quad (n, SCOPE, *, t),
- for each value v in n.[variants] add the quad (n, VARIANT, v, v.[value]), then follow the procedure for variant items below, and
- for each value l in n.[source locators] add the quad (n, SOURCE_LOCATOR, *, l).

The procedure for any variant item v is:

- for each value t in v.[scope] add the quad (v, SCOPE, *, t),
- for each value l in v.[source locators] add the quad (v, SOURCE_LOCATOR, *, l).

The procedure for any occurrence item o is:

- if o.[type] has a value t, add the quad (t, META_TYPE, *, OCCURRENCE),
- for each value t in o.[scope] add the quad (o, SCOPE, *, t),
- for each value l in o.[source locators] add the quad (o, SOURCE_LOCATOR, *, l).

The procedure for any association item a is:

- add the quad (tm, ASSOCIATION, *, a),
- if there is a value t in a.[type] add the quad (a, TYPE, *, t),
- for each value t in a.[scope] add the quad (a, SCOPE, *, t),
- for each value r in a.[roles] add the quad (a, p, r, t), where p is the value of r.[type] if there is one or ASSOCIATION_ROLE otherwise, and t is the value of r.[player], then follow the procedure for association roles below, and
- for each value l in a.[source locators] add the quad (a, SOURCE_LOCATOR, *, l).

The procedure for any association role item r is:

- for each value r in r.[source locators] add the quad (r, SOURCE_LOCATOR, *, l).

*[Issues: handling reification and datatyping.
Both can be done easily and non-elegantly. Holding out for the elegant
solution on both.]*

At this stage we have reproduced the TMDM instance in Q, but where
topic items reified other items in the TMDM instance the connection is
only indirect in Q. *[This can be solved by
finding reification pairs, then merging them. However, this may cause
difficulties on the return trip. It certainly causes difficulties for
the reifies() predicate in tolog.]*

For transforming a Q model to TMDM to be possible the Q model needs to use the Q vocabulary for representing TMDM defined in the previous section. In the following we will assume this without further comment; it is noteworthy, however, that the transformation is robust in the presence of additional non-TMDM information.

To transform a Q model M to TMDM, the first step is to do a little
analysis on the model in order to know which predicates are name
predicates, and which are occurrence predicates. This is done as
follows:

N = σ(ɸ(M, *, META_TYPE, *, TOPIC_NAME), 1) O = σ(ɸ(M, *, META_TYPE, *, OCCURRENCE), 1)

Now we can find the identity of the topic map. This is done with

tm = σ(ɸ(M, *, TOPIC, *, *), 1)We can then construct the topic map item as follows:

tm.[base locator] = σ(ɸ(M, tm, BASE_LOCATOR, *, *), 4) tm.[source locators] = σ(ɸ(M, tm, SOURCE_LOCATOR, *, *), 4) tm.[topics] = σ(ɸ(M, tm, TOPIC, *, *), 4) tm.[associations] = σ(ɸ(M, tm, ASSOCIATION, *, *), 4)

For each element t in

σ(ɸ(M, *, TOPIC, *, *), 4)create a topic item as follows:

t.[topic names] = σ(φ(M, t, N), 4) t.[occurrences] = σ(φ(M, t, O), 4) t.[subject identifiers] = σ(ɸ(M, t, SUBJECT_IDENTIFIER, *, *), 4) t.[subject locators] = σ(ɸ(M, t, SUBJECT_LOCATOR, *, *), 4) t.[source locators] = σ(ɸ(M, t, SOURCE_LOCATOR, *, *), 4) t.[parent] = tm

For each element n in

σ(φ(M, t, N), 4)create a topic name item as follows:

n.[value] = σ(ɸ(M, t, *, n, *), 4) n.[type] = σ(ɸ(M, t, *, n, *), 2) n.[scope] = σ(ɸ(M, n, SCOPE, *, *), 4) n.[variants] = σ(ɸ(M, n, VARIANT, *, *), 4) n.[source locators] = σ(ɸ(M, n, SOURCE_LOCATOR, *, *), 4) n.[parent] = t

For each element v in

σ(ɸ(M, n, VARIANT, *, *), 4)create a variant item as follows:

v.[value] = σ(ɸ(M, n, *, v, *), 4) v.[scope] = σ(ɸ(M, v, SCOPE, *, *), 4) v.[source locators] = σ(ɸ(M, v, SOURCE_LOCATOR, *, *), 4) v.[parent] = n

For each element o in

σ(φ(M, t, O), 4)create an occurrence item as follows:

o.[value] = σ(ɸ(M, t, *, o, *), 4) o.[type] = σ(ɸ(M, t, *, o, *), 2) o.[scope] = σ(ɸ(M, o, SCOPE, *, *), 4) o.[source locators] = σ(ɸ(M, o, SOURCE_LOCATOR, *, *), 4) o.[parent] = t

For each element a in

σ(ɸ(M, *, ASSOCIATION, *, *), 4)create an association item as follows:

M' = ɸ(M, a, *, *, *) - φ(M, a, {SOURCE_LOCATOR, TYPE, SCOPE}) a.[type] = σ(ɸ(M, a, TYPE, *, *), 4) a.[scope] = σ(ɸ(M, a, SCOPE, *, *), 4) a.[roles] = σ(M', 3) a.[source locators] = σ(ɸ(M, a, SOURCE_LOCATOR, *, *), 4) a.[parent] = tm

*[Start by observing how many quads go into
representing some example topic maps. Then show how it is possible to
lose the TOPIC and ASSOCIATION quads, which considerably reduces the
quad count. Then show how to compress binary associations, after which
we have a really efficient representation. Use RDFTM test cases to
show that efficience is same as for quad-based RDF
implementation.]*

*[Discuss implementation strategies, both
in-memory and with custom persistent backend. Key issue: keeping
identifiers stable in the implementation. Show how registering event
listeners is easy with this model. Show how user-defined indexes can
be used to tune performance.]*

One might think that representing RDF in Q is straightforward, but there is actually a difficulty: URIs are identifiers in RDF, but literals in Q. This means that the most obvious representation is not available. However, representation is still quite straightforward.

To construct a Q model from an RDF model, add one quad as follows for
each URI in the RDF model which appears in one of the first two
positions of a triple: (*, RDF_URI, *, uri). We then define the
following function:

i(M, u) = σ(ɸ(M, *, RDF_URI, *, u), 1)

Then, for each triple (s, p, o) in the RDF model where s is a URI, add the following quad (i(s), s(p), *, z) where z = o if o is a literal or a URI about which nothing is asserted and otherwise z=3Di(o).

*[Transform M so that reification of RDF
statements is represented directly.]*

*[Well, this wasn't exactly elegant. We need to
adjust the Q representation of TMDM so that only a slight annotation
of the RDF representation is necessary in order for an RDF
representation to also become a topic map. Also need to ensure that Q
representations of TMDM can be seen as reasonable RDF.]*

*[Finally describe an implementation approach
that avoids having all the RDF_URI quads hanging around.]*

*[Relate this to the discussion of quads in
RDF, and show which camp this approach belongs to. Also show how named
graphs and context are representable using this approach.]*

The Q model can be used to define the semantics of the main query
language for topic maps (tolog^{<span
style="background-color:white">1})
and also that for RDF (SPARQL). Q also makes it possible to create a
single implementation of both languages.

tolog**[Garshol04a]** is a topic maps
query language
inspired by Prolog, which has much in common with Datalog. In one
sense, tolog could be described as consisting of two parts: a core
language very similar to Datalog with SQL parts added on, and a
predicate library for accessing the fine structure of the topic map
being queried**[Garshol04b]**.

*[Define result set as tabular structure where
each column is a variable and each row is a match. Define predicate
application (which is the same as predicate chaining). OR is simply
two predicate chains whose results are unioned. NOT is defined as a
filter. Optional clauses can be represented with OR and NOT. Inference
rules are just an example of predicate application. SELECT and ORDER
each require an operation. LIMIT and OFFSET, too. And that's
it.]*

The predicate library can easily be defined using tolog inference rules given the predicate _q($W, $X, $Y, $Z), which accesses the Q model. The predicate _q is of course not available in the syntax itself.

association($A) :- _q($TM, ASSOCIATION, $ID, $A). association-role($A, $R) :- association($A), _q($A, $P, $ID, $R), $P /= SOURCE_LOCATOR, $P /= TYPE, $P /= SCOPE . base-locator($LOC) :- _q($TM, BASE_LOCATOR, $ID, $LOC). direct-instance-of($I, $T) :- uses associations, and so is easy. instance-of($I, $T) :- uses associations, and so is easy. occurrence($T, $O) :- _q($P, META_TYPE, $ID, OCCURRENCE), _q($T, $P, $ID2, $O). reifies($RR, $RD) :- /* don't need Q for this */ subject-identifier($RR, $LOC), source-locator($RD, $LOC). resource($O, $L) :- /* requires datatyping to get right */ { _q($N, VARIANT, $O, $L) | _q($P, META_TYPE, $ID, OCCURRENCE), _q($T, $P, $O, $L) }. role-player($R, $T) :- association-role($A, $R), _q($A, $P, $R, $T). scope($O, $T) :- _q($O, SCOPE, $ID, $T). source-locator($O, $L) :- _q($O, SOURCE_LOCATOR, $ID, $L). subject-identifier($T, $L) :- _q($T, SUBJECT_IDENTIFIER, $ID, $L). subject-locator($T, $L) :- _q($T, SUBJECT_LOCATOR, $ID, $L). topic($T) :- _q($M, TOPIC, $ID, $T). topic-name($T, $N) :- _q($P, META_TYPE, $ID, TOPIC_NAME), _q($T, $P, $ID2, $N). topicmap($TM) :- _q($TM, TOPIC, $ID, $T). type($O, $T) :- { /* association */ _q($O, TYPE, $ID, $T) | /* association role */ association-role($A, $O), _q($A, $T, $O, $TOPIC) | /* occurrence */ _q($P, META_TYPE, $ID2, OCCURRENCE), _q($TOPIC, $T, $ID2, $O) | /* topic name */ _q($P, META_TYPE, $ID, TOPIC_NAME), _q($TOPIC, $T, $ID2, $O) } value($O, $V) :- ... value-like($O, $V) :- requires fulltext predicate. variant($N, $V) :- _q($N, VARIANT, $V, $L).

*[tolog and SPARQL result sets are the same.
SPARQL triple patterns become applications of the predicate _q.
OPTIONAL is the same as optional clauses in tolog. UNION is the same
as OR in tolog. GRAPH requires an extension to the Q representation of
RDF, but one that is discussed in the implementation section.
SELECT, SELECT DISTINCT, SELECT LIMIT. Also look into core operators
in SPARQL to see if something needs to be said about those.]*

*[This section will be rather short, but will
basically point out that as far as TMCL is concerned, Q is just a
convenient way to define its semantics. RDFS and OWL get into trouble
with the reification, however. Discuss implications of
this.]*

This is expected to also be true for TMQL, but the specification of that language has not yet progressed to the point where this can be shown. This paper therefore uses tolog as its example instead.

The author is indebted to Robert Barta and Dmitry Bogachev for constructive criticism of the model presented herein. Geir Ove Grønmo reviewed an early draft of the paper and caught many errors.

**[Garshol04a]**
*tolog—Language
tutorial*, Lars Marius Garshol, Ontopia AS, Norway,
2004-12-08.

http://www.ontopia.net/omnigator/docs/query/tutorial.html

**[Garshol04b]**
*The Built-in tolog
Predicates—Reference Documentation*, Lars Marius
Garshol, Ontopia AS, Norway, 2004-09-08.

http://www.ontopia.net/topicmaps/materials/tolog-predicate-reference.html

to top

Revisions: | r1.2 | > | r1.1 | Total page history | Backlinks

Copyright © 1999-2019 by the contributing authors.
All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding Fabio's Wiki? Send feedback

Ideas, requests, problems regarding Fabio's Wiki? Send feedback