Cheló̱na Version 1.2

A Parboiled2 based parser for the RDF 1.1 Turtle, RDF 1.1 TriG, RDF 1.1 N-Triples and RDF 1.1 N-Quads languages for Scala, Java and JavaScript.

Introduction

Cheló̱na (χελώνα) is the greek word for turtle. At the time working on this software I spent a beautiful holiday on the island of Rhodos, so this seemed to be an appropriate name for a Turtle parser.

Cheló̱na successfully masters the complete TriG-, Turtle-, N-Triples and N-Quad test suites described at W3 RDF11-Testcases. The 100% compliance of the Turtle test suite is listed at W3 RDF-Turtle Reports. The compliance of the TriG- and N-Triples test suite is reported to the W3 Consortium.

Cheló̱na is written completely in Scala. It can be used as a stand alone program or it can be integrated in your Java or Scala project.

Starting with version 1.2.0 Cheló̱na is available for JavaScript, too. Cross compilation is done with the help of Scala.js. This opens the door for Cheló̱na usage in a Web-Browser or as a server side script.

Parsing of the supported RDF 1.1 formats is done with the help of Parboiled2.

Features

Cheló̱na as an Application

Use Cheló̱na as an application to convert RDF 1.1 TriG-, RDF 1.1 Turtle-, RDF 1.1 N-Triples- or RDF 1.1 N-Quads-files into JSON-LD or a simple N-Triples or N-Quads format.

Extend the Cheló̱na application:

Build your own output format for the TriG, Turtle, N-Triples or N-Quads languages.

Filter or transform the parsed data to your needs

Write the conversion result e.g. to a file, to a database or communicate the results via actors

Validate the syntax and semantics of a W3C RDF 1.1 TriG, Turtle, N-Quads or N-Triples file

Cheló̱na as a Library

Integrate Cheló̱na into your application to handle RDF 1.1 input formats (RDF 1.1 TriG, RDF 1.1 Turtle, RDF 1.1 N-Triples or RDF 1.1 N-Quads).

Installation

Cheló̱na Application

The Cheló̱na application is available from Maven Central. Download chelona_2.11-1.2.0.jar. The latest released version is version 1.2.0. It is available for Scala 2.11. A version for Scala 2.12 will be released soon.

Recommended Definitions

For ease of use, copy the downloaded Maven Central file to /usr/local/opt/chelona. The additional symbolic link introduced in the lines below makes it easy to upgrade to a newer release of the Cheló̱na application. No need to adapt any scripts or references on your side.

mkdir -p /usr/local/opt/chelona
cp jvm/target/scala-2.11/chelona_2.11-1.2.0.jar /usr/local/opt/chelona
ln -s /usr/local/opt/chelona/chelona_2.11-1.2.0.jar /usr/local/opt/chelona/chelona.jar

Define an alias for each of the RDF-formats (e.g. in your ~/.bashrc file):

alias chelona='scala -Yopt:_ -J-Xmx4G -cp /usr/local/opt/chelona/chelona.jar org.chelona.TurtleMain'

Just to make naming homogeneous, a second alias cturtle is introduced. It is up to you, which alias you prefer. Use either chelona or cturtle to convert a Turtle file.

alias cturtle='chelona'

You get more than just a converter for the RDF 1.1 Turtle format. You also get converters for the RDF 1.1 TriG language, the RDF 1.1 N-Triples language and the RDF 1.1 N-Quads language.

alias ctrig='scala -Yopt:_ -J-Xmx4G -cp /usr/local/lib/opt/chelona/chelona.jar org.chelona.TriGMain'
alias ctriples='scala -Yopt:_ -J-Xmx4G -cp /usr/local/opt/chelona/chelona.jar org.chelona.NTMain'
alias cquads='scala -Yopt:_ -J-Xmx4G -cp /usr/local/opt/chelona/chelona.jar org.chelona.NQuadMain'

The N-Triples and N-Quads formats are able to consume very large files without heap-size exceeding the proposed values of 4 Gigabyte as they process the input in fixed sized chunks.

A chunk-wise digesting for TriG and Turtle formats has not been implemented yet. For huge input data it might therefore be necessary to increase the heap-size for these formats.

Cheló̱na Library

The Cheló̱na artifacts are available from Maven Central and can be integrated into your SBT-based Scala project like this:

libraryDependencies += "com.github.jupfu" % "chelona_2.11" % "1.2.0"

The latest released version is version 1.2.0. It is available for Scala 2.11. A version for Scala 2.12 will be released soon.

Programmatical Interface

This example program shows how to integrate Cheló̱na in your own program.

The program converts some Turtle data into the N3 Triple format.

import java.io.{StringWriter, Writer}

import org.chelona._

import org.parboiled2.ParseError

import scala.util.{Failure, Success}

object Main extends App with RDFTurtleOutput {

  val input =
    """@base <http://example.org/> .
       @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
       @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
       @prefix foaf: <http://xmlns.com/foaf/0.1/> .
       @prefix rel: <http://www.peChelonaParserrceive.net/schemas/relationship/> .

       <#green-goblin> rel:enemyOf    <#spiderman> 	;
          a foaf:Person ;    # in the context of the Marvel universe
          foaf:name 'Green Goblin' ;
          foaf:mail 'GreenGoblin@marvel.com' .

       <#spiderman>
          rel:enemyOf <#green-goblin> ;
          a foaf:Person ;
          foaf:name 'Spiderman', 'Человек-паук'@ru ."""

  val output = new StringWriter()

  val evalTurtle = new EvalTurtle(tripleWriter(output) _, "", "")

  val parser = ChelonaParser(input, evalTurtle.renderStatement)

  parser.turtleDoc.run() match {
    case Success(tripleCount) ⇒
      System.err.println("Input converted to " + tripleCount + " triples.")
      println(output)
    case Failure(e: ParseError) ⇒ System.err.println("Unexpected error during parsing run: " + parser.formatError(e))
    case Failure(e)             ⇒ System.err.println("Unexpected error during parsing run: " + e)
  }
}

A complete sbt build file for the example program is shown here

name := "Chelona-Example"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies += "com.github.jupfu" % "chelona_2.11" % "1.2.0"

Build Cheló̱na from source

Prerequisites

Please, assure the development tools listed are up to date and properly working in your environment before you proceed:

Java JDK

Scala

sbt - Scala Build Tool

Download Source Files

Download Cheló̱na sources Cheló̱na Zip-File from this site.

Unzip the data in a clean directory.

Scala and Java Part

Create a Cheló̱na Jar Archive

To create a jar-file for the Java virtual machine move into the Cheló̱na directory. From the command line type

sbt chelonaJVM/assembly

The sbt-assembly plugin is used to create a Cheló̱na jar file containing all dependencies.

This results in a chelona-assembly-x.x.x.jar jar-archive deployed in the jvm/target/scala-2.11 sub-directory. The placeholder x.x.x denotes the version information, e.g. chelona-assembly-1.2.0.jar. It is recommended to define the Cheló̱na aliases as described in the section Recommended Definitions

JavaScript Part

Create a Cheló̱na Jar-Archive for Scala.js

To create a jar-file for use with Scala.js, move into the Cheló̱na directory. From the command line type

sbt
chelonaJS/clean
chelonaJS/compile
chelonaJS/package

This results in a chelona_sjs0.6_2.11-x.x.x.jar jar-archive deployed in the js/target/scala-2.11 sub-directory. The placeholder x.x.x denotes the version information, e.g. chelona_sjs0.6_2.11-1.2.0.jar.

Create a Cheló̱na JavaScript File

To make a JavaScript library type

sbt chelonaJS/fullOptJS

This results in a JavaScript file chelona-opt.js deployed in the js/target/scala-2.11 sub-directory.

To see chelona-opt.js in action, open the file turtle.html, which is located in chelonaJS/src/main/resource with your favorite browser.

How to use Cheló̱na

Running Cheló̱na from the command line

This section shows how to convert a RDF 1.1 Turtle file into the simple S-P-O Turtle format (N3). The file example1.ttl is the first example for the Terse RDF Triple Language aka Turtle format. It describes the relationship between Green Goblin and Spiderman.

@base <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://www.perceive.net/schemas/relationship/> .

<#green-goblin>
    rel:enemyOf <#spiderman> ;
    a foaf:Person ;    # in the context of the Marvel universe
    foaf:name "Green Goblin" .

<#spiderman>
    rel:enemyOf <#green-goblin> ;
    a foaf:Person ;
    foaf:name "Spiderman", "Человек-паук"@ru

Assuming you have installed Cheló̱na in /usr/local/opt/chelona as recommended in Recommended Definitions, the following command let's you convert the Turtle format into the simple N3 triple format.

scala -cp /usr/local/opt/chelona/chelona.jar org.chelona.Main --verbose ./examples/example1.ttl > example1_n3.ttl

Using the recommended alias makes it easier and shorter to achieve the same goal

chelona -v ./examples/example1.ttl > example1_n3.ttl

The console log information generated lists the name of the output file and the number of generated triples:

Convert: /Users/jp/chelona/examples/example1.ttl
Input file '/Users/jp/chelona/examples/example1.ttl' converted in 0.064sec 8 triples (triples per second = 125)

Inspecting the output file example1_n3.ttl should give this result:

<http://example.org/#green-goblin> <http://www.perceive.net/schemas/relationship/enemyOf> <http://example.org/#spiderman> .
<http://example.org/#green-goblin> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.org/#green-goblin> <http://xmlns.com/foaf/0.1/name> "Green Goblin" .
<http://example.org/#green-goblin> <http://xmlns.com/foaf/0.1/mail> "GreenGoblin@marvel.com" .
<http://example.org/#spiderman> <http://www.perceive.net/schemas/relationship/enemyOf> <http://example.org/#green-goblin> .
<http://example.org/#spiderman> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.org/#spiderman> <http://xmlns.com/foaf/0.1/name> "Spiderman" .",
<http://example.org/#spiderman> <http://xmlns.com/foaf/0.1/name> "Человек-паук"@ru ."

Validation of a Turtle File

When passing the parameter '--validate' on the command line, Cheló̱na will do a syntax check. No output file is generated.

chelona -v --validate examples/example1.ttl

The -v parameter is a shortcut for --verbose, which causes Cheló̱na to emit some status information.

Validate: /Users/jp/chelona/examples/example1.ttl
Input file '/Users/jp/chelona/examples/example1.ttl' composed of 7 statements successfully validated in 0.099sec (statements per second = 71)

Unique Blank Node Names

For sake of convenience the next examples assume that the chelona alias has been created.

The '--uid' command line argument instructs Cheló̱na to use a unique identifier for blank nodes.

chelona --uid examples/example23.ttl

Blank nodes are emitted with a random base, which is incremented as needed

_:c83cecd897cf243a9a6203bad3f5f0b411 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
_:c83cecd897cf243a9a6203bad3f5f0b411 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:c83cecd897cf243a9a6203bad3f5f0b412 .
_:c83cecd897cf243a9a6203bad3f5f0b412 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "2.0"^^<http://www.w3.org/2001/XMLSchema#decimal> .
_:c83cecd897cf243a9a6203bad3f5f0b412 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:c83cecd897cf243a9a6203bad3f5f0b413 .
_:c83cecd897cf243a9a6203bad3f5f0b413 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "3E1"^^<http://www.w3.org/2001/XMLSchema#double> .
_:c83cecd897cf243a9a6203bad3f5f0b413 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
_:c83cecd897cf243a9a6203bad3f5f0b411 <http://example.org/stuff/1.0/p> "w" .

Base Definition

Relative IRIs like

<#green-goblin>

are resolved relative to the current base IRI. When no base is defined, the value passed by the '--base' parameter is prepended to the relative IRI.

Look at file 'base.ttl', which consists of a single triple statement with two relative IRIs.

<#green-goblin> a <#comic-hero> .

The relative IRIs are appended to the "http://marvel/universe" string.

chelona --base "http://marvel/universe" base.ttl

Here the generated result

<http://marvel/universe/#green-goblin> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://marvel/universe/#comic-hero> .

With $PWD resolving to the current working directory /Users/jp/chelona the '--base' parameter produces a prefix <file:///Users/jp/chelona/ which is prepended to the relative IRI.

chelona --base "file://"$PWD base.ttl

The same example file converted again with $PWD resolved shows the prepended prefix

<file:///Users/jp/chelona/#green-goblin> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <file:///Users/jp/chelona/#comic-hero> .

Error Reporting

In case of an error Cheló̱na will display an error message and give a hint where the problem occurred.

chelona --validate --verbose TurtleTests/turtle-syntax-bad-struct-02.ttl

Error report

Validate: /Users/jp/chelona/TurtleTests/turtle-syntax-bad-struct-02.ttl
File '/Users/jp/chelona/TurtleTests/turtle-syntax-bad-struct-02.ttl': Invalid input '=', expected predicate or http://www.w3.org/1999/02/22-rdf-syntax-ns#type (line 2, column 40):
<http://www.w3.org/2013/TurtleTests/s> = <http://www.w3.org/2013/TurtleTests/o> .

Some internal parsing information will be emitted in case of an error when "--trace" is used as command line argument.

chelona --validate --verbose --trace TurtleTests/turtle-syntax-bad-struct-02.ttl

Extended error report

Validate: /Users/jp/chelona/TurtleTests/turtle-syntax-bad-struct-02.ttl
File '/Users/jp/chelona/TurtleTests/turtle-syntax-bad-struct-02.ttl': Invalid input '=', expected IRIREF or prefixedName (line 2, column 40):
<http://www.w3.org/2013/TurtleTests/s> = <http://www.w3.org/2013/TurtleTests/o> .
                                       ^

5 rules mismatched at error location:
...oc/ *:-58 /statement/ |:-39 /triples/ |:-39 /predicateObjectList/ + /po/ /verb/ | /predicate/ /iri/ | /IRIREF/ atomic / '<'
...| /predicate/ /iri/ | /prefixedName/ | /PNAME_LN/ /PNAME_NS/ ? /PN_PREFIX/ atomic / capture / PN_CHARS_BASE:<CharPredicate>
... |:-39 /triples/ |:-39 /predicateObjectList/ + /po/ /verb/ | /predicate/ /iri/ | /prefixedName/ | /PNAME_LN/ /PNAME_NS/ ':'
...po/ /verb/ | /predicate/ /iri/ | /prefixedName/ | /PNAME_NS/ ? /PN_PREFIX/ atomic / capture / PN_CHARS_BASE:<CharPredicate>
.../statement/ |:-39 /triples/ |:-39 /predicateObjectList/ + /po/ /verb/ | /predicate/ /iri/ | /prefixedName/ | /PNAME_NS/ ':'

RDF 1.1 Language Formats

How are the RDF 1.1 language formats linked together ?

The relationship among the various RDF 1.1 formats is shown in this image

RDF Hierarchy

N-Triples Format

Quoting the W3C definition of N-Triples: The N-Triples format is a line-based, plain text format for encoding an RDF graph. N-Triples triples are a sequence of RDF terms representing the subject, predicate and object of an RDF Triple. These may be separated by white space. This sequence is terminated by a '.' and a new line (optional at the end of a document).

Example 1 contains most of the syntax features for N-Triples

<http://one.example/subject1> <http://one.example/predicate1> <http://one.example/object1> . # comments here
# or on a line by themselves
_:subject1 <http://an.example/predicate1> "object1" .
_:subject2 <http://an.example/predicate2> "object2" .

N-Quads Format

Quoting the W3C definition of N-Quads: The N-Quads format is a line-based, plain text format for encoding an RDF dataset. N-Quads statements are a sequence of RDF terms representing the subject, predicate, object and graph label of an RDF Triple and the graph it is part of in a dataset. These may be separated by white space. This sequence is terminated by a '.' and a new line (optional at the end of a document).

Example 1 contains most of the syntax features for N-Quads

<http://one.example/subject1> <http://one.example/predicate1> <http://one.example/object1> <http://example.org/graph3> . # comments here
# or on a line by themselves
_:subject1 <http://an.example/predicate1> "object1" <http://example.org/graph1> .
_:subject2 <http://an.example/predicate2> "object2" <http://example.org/graph5> .

It is important to note that each N-Triples file is also a valid N-Quads file. This is also reflected in that the N-Quads grammar extends the N-Triples grammar with just two rules.

Therefore, each N-Triples file can also be used as input for the N-Quads converter.

Turtle Format

Quoting the W3C definition of Turtle: A Turtle document is a textual representations of an RDF graph.

N-Triples triples are also Turtle simple triples, but Turtle includes other representations of RDF terms and abbreviations of RDF Triples. When parsed by a Turtle parser, data in the N-Triples format will produce exactly the same triples as a parser for the N-triples language.

TriG Format

Quoting the W3C definition of TriG: TriG is an extension of Turtle, extended to support representing a complete RDF Dataset.

Therefore, Turtle is a valid input format for the TriG parser.

N-Triples triples are also Turtle simple triples. As TriG is an extension of Turtle, N-Triples are also TriG triples. When parsed by a TriG parser, data in the N-Triples format will produce exactly the same triples as a parser for the N-triples language.

It is important to note that each N-Quads file is also a valid TriG file. This is also reflected in that the TriG grammar extends the N-Quads grammar with just a few rules.

Summary

This image sums up the last section by showing which RDF 1.1 formats can be proffered as input to which RDF 1.1 parser.

The simple N-Triples format is understood from any of the depicted parsers

The Turtle format can be digested by Turtle and TriG parser

The N-Quads format is accepted by the N-Quads and TriG parser

TriG data can only be processed by the TriG parser

Turtle

First Example

Here a simple turtle file listed as Example 14 in the Blank Nodes chapter of the W3C RDF 1.1 Turtle definition document:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

_:alice foaf:knows _:bob .
_:bob foaf:knows _:alice .

This Turtle file is availabe in the examples sub-directory. Applying Cheló̱na to convert the Turtle file into the N-Triples format:

chelona examples/example14.ttl

Cheló̱na dissolves the statements into the canonical form, which is represented by pure subject, predicate and object components:

_:b1 <http://xmlns.com/foaf/0.1/knows> _:b2 .
_:b2 <http://xmlns.com/foaf/0.1/knows> _:b1 .

Second Example

The next turtle file is listed as Example 15 in the W3C RDF 1.1 Turtle Definition document:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

# Someone knows someone else, who has the name "Bob".
[] foaf:knows [ foaf:name "Bob" ] .

This Turtle file is availabe in the examples sub-directory. Applying Cheló̱na to convert the Turtle file into the N-Triples format:

chelona examples/example15.ttl

Cheló̱na transforms the statements into the simple subject-predicate-object (s-p-o) format:

_:a1 <http://xmlns.com/foaf/0.1/knows> _:b1 .
_:b1 <http://xmlns.com/foaf/0.1/name> "Bob" .

Third Example

This is Example 25 from the W3C RDF 1.1 Terse RDF Triple language definition. It serves as an example for nested collections involving other syntactic forms.

PREFIX : <http://example.org/stuff/1.0/>
(1 [:p :q] ( 2 ) ) :p2 :q2 .

This Turtle file is availabe in the examples sub-directory. Applying Cheló̱na to convert the Turtle file into the N-Triples format:

chelona examples/example25.ttl

Cheló̱na resolves the collection and gives the equivalent sequence of turtle statements in the canonical form:

_:c1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
_:c1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:c2 .
_:c2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:b1 .
_:b1 <http://example.org/stuff/1.0/p> <http://example.org/stuff/1.0/q> .
_:c2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:c3 .
_:c3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:c4 .
_:c4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
_:c4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
_:c3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
_:c1 <http://example.org/stuff/1.0/p2> <http://example.org/stuff/1.0/q2> .

A detailed explanation as how to derive this N-Triples output from the input file above can be found in the quest for Missing triple in example 26.

TriG

First Example

Example 1 is a simple TriG-file, which is part of the TriG language description.

# This document encodes one graph.
@prefix ex: <http://www.example.org/vocabulary#> .
@prefix : <http://www.example.org/exampleDocument#> .

:G1 { :Monica a ex:Person ;
              ex:name "Monica Murphy" ;
              ex:homepage <http://www.monicamurphy.org> ;
              ex:email <mailto:monica@monicamurphy.org> ;
              ex:hasSkill ex:Management ,
                          ex:Programming . }

Applying the TriG parser to convert this file into the N-Triples format either explicitly

scala -Yopt:_ -J-Xmx4G -cp /usr/local/opt/chelona/chelona.jar org.chelona.TriGMain examples/example1.trig

or shorter using the alias ctrig

ctrig examples/example1.trig

Cheló̱na dissolves the statements into the canonical form, which is represented by pure subject, predicate, object and graph components:

<http://www.example.org/exampleDocument#Monica> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.example.org/vocabulary#Person> <http://www.example.org/exampleDocument#G1> .
<http://www.example.org/exampleDocument#Monica> <http://www.example.org/vocabulary#name> "Monica Murphy" <http://www.example.org/exampleDocument#G1> .
<http://www.example.org/exampleDocument#Monica> <http://www.example.org/vocabulary#homepage> <http://www.monicamurphy.org> <http://www.example.org/exampleDocument#G1> .
<http://www.example.org/exampleDocument#Monica> <http://www.example.org/vocabulary#email> <mailto:monica@monicamurphy.org> <http://www.example.org/exampleDocument#G1> .
<http://www.example.org/exampleDocument#Monica> <http://www.example.org/vocabulary#hasSkill> <http://www.example.org/vocabulary#Management> <http://www.example.org/exampleDocument#G1> .
<http://www.example.org/exampleDocument#Monica> <http://www.example.org/vocabulary#hasSkill> <http://www.example.org/vocabulary#Programming> <http://www.example.org/exampleDocument#G1> .

N-Triples

N-Triples part

N-Quads

Quads part

A short Explanation of Cheló̱na

The parsing process is divided into two separate steps, which are parsing and AST (Abstract Syntax Tree) evaluation.

Parsing

The first step is parsing the input.

Parsing of any of the RDF-grammars is done with the help of Parboiled2. Driven by the input data the grammar rules are evaluated. Starting with the main entry rule, e.g. ntriplesDoc for the N-Triples parser, the parsing process branches into matching sub-rules. Rules which bear relevant semantic information produce an Abstract Syntax Tree node (AST-node).

The parsing process of the N-Triples statement below,

_:alice <http://xmlns.com/foaf/0.1/knows> _:bob . # Alice knows Bob

results in an AST listed here

ASTTriple(
   ASTSubject(ASTBlankNodeLabel(alice)),
   ASTPredicate(ASTIriRef(http://xmlns.com/foaf/0.1/knows)),
   ASTObject(ASTBlankNodeLabel(bob)),
   Some(ASTComment( Alice knows Bob)))
)

The N-Triples statement has been decomposed in the following elements

a subject, which is a blank node named "alice",

a predicate, which is a URL "http://xmlns.com/foaf/0.1/knows",

an object, which is a blank node named "bob" and

an optional comment " Alice knows Bob".

AST Traversal and Evaluation

The second step is the traversal and evaluation of the AST nodes.

All of the RDF-grammars traverse and evaluate the AST for a single input statement at a time. The evalStatement-function returns a NTripleReturnValue type.

Comments are dropped in the current implementation of the renderStatement function. The NTTriples-routine has no comment parameter for trailing comments and not so obvious, stand alone comments on a line, which can be handled by adding the case NTComment(c) in the renderStatement routine, are also ignored.

def renderStatement(ast: NTripleType): Int = {
  (evalStatement(ast): @unchecked) match {
    case NTTriple(s, p, o) ⇒ output(s, p, o)
    case _                 ⇒ 0
  }
}

The decoupling of parsing and AST evaluation has various advantages.

As each AST for a single statement is traversed and evaluated separately, this results in output for each statement generated at once

The building of an AST for a single RDF-statement reduces the memory requirements drastically

Spawning the AST traversal in a separate thread (for the Scala branch) allows parsing to continue immediately

Do It Yourself

Cheló̱na supports the building of your own output format. This self created format can be written to a file, it can be stored in a database or transferred via actors. The Turtle and TriG format expect a function of type List[SPOReturnValue] ⇒ Int. One Turtle or TriG input statement can result in one to n triples respectively quadruples being generated. The generated triples respectively quadruples for a single Turtle or TriG statement are assembled in the list of SPOReturnValues. Each Turtle or TriG statement fills the list anew. The Int return value of the output function should return the number of output statements generated for each Turtle or TriG statement. A few examples will show how to proceed.

Standard NTriple Format

The TurtleMain object extends trait RDFTurtleOutput, which is defined in RDFOutput.scala.

The defined code block referenced for writing the subject, predicate and object elements is listed below

def turtleWriter(bo: Writer)(triple: List[RDFReturnType]): Int = {
  triple.map { case TurtleTriple(s, p, o) ⇒ { bo.write(s.text + " " + p.text + " " + o.text + " .\n") } }.length
}

With a writer stream passed to tripleWriter the required signature for the output function is achieved

The TurtleMain routine contains the following reference to the turtleWriter

/* tripleWriter is defined in trait RDFTurtleOutput */
val evalTurtle = new EvalTurtle(turtleWriter(output) _, base, label)
/* pass the input, render procedure and some flags to the parser */
val parser = ChelonaParser(input, evalTurtle.renderStatement, validate, base, label)
/* start turtle parser */
val res = parser.turtleDoc.run()

Simple HTML-Output for TriG Parser Input

There are a few things to consider when defining a simple HTML-table output format. Angle brackets '<', '>' have to be substituted by < and > and string literals can be followed by an iri (URI).

A complete example shows the usage of the HTML output definition:

def trigHTMLWriter(bo: Writer)(quad: List[RDFReturnType]): Int = {
  def formatter(token: String, `type`: Int) = {
    if (TriGBitValue.isIRIREF(`type`))
      "&lt;" + token.substring(1, token.length - 1) + "&gt;"
    else
      token
  }

  quad.map {
    case TriGTuple(s, p, o, g) ⇒ {
      val subject = formatter(s.text, s.tokenType)
      val predicate = formatter(p.text, p.tokenType)
      val `object` = formatter(o.text, o.tokenType)
      val graph = formatter(g.text, g.tokenType)

      bo.write(subject + " " + predicate + " " + `object` + (if (graph.length > 0) " " + graph + " .\n" else " .\n"))
    }
  }.length
}

Adding State to Simple HTML-Output for Cheló̱na

A counter is induced into the output function.

object myCount { var counter = 0 }

def trigHTMLWriter(counterObject: myCount.type)(bo: Writer)(quad: List[SPOReturnValue]): Int = {
  def formatter(token: String, `type`: Int) = {
     if (TriGBitValue.isIRIREF(`type`))
       "&lt;" + token.substring(1, token.length - 1) + "&gt;"
     else
       token
   }

   quad.map {
     case TriGTuple(s, p, o, g) ⇒ {
       val subject = formatter(s.text, s.tokenType)
       val predicate = formatter(p.text, p.tokenType)
       val `object` = formatter(o.text, o.tokenType)
       val graph = formatter(g.text, g.tokenType)

       myCount.counter = myCount.counter + 1

       bo.write("<tr><td><code>" + myCount.counter + "</code>" + subject + " " + predicate + " " + `object` + (if (graph.length > 0) " " + graph + " .\n" else " .\n"))
     }
   }.length
 }

JSON-LD Output for N-Triples

The file RDFOutput.scala contains the output formats currently available for the Cheló̱na package.

If you are planning to make your own output format or apply some transformations or filtering to the parsed input data, then this is the right place to look at.

First, a short glance how Cheló̱na is digesting the input format. The way it works for N-Triples is also valid for the other input formats supported by Cheló̱na.

The N-Triples parser NTriplesParser.scala parses a complete N-Triples statement. In the parsing process an Abstract Syntax Tree (AST) is build for this statement. The AST for this single parsed N-Triple statement is passed to the render routine. This is done for each N-Triples statement contained in the input data.

Here the relevant excerpt of the main rule, which shows the renderStatement(ast) call:

//[1]	ntriplesDoc	::=	triple? (EOL triple)* EOL?
def ntriplesDoc: Rule1[Int] = rule {
  (triple ~> ((ast: NTripleAST) ⇒
    if (!__inErrorAnalysis) {
      if (!validate) {
        renderStatement(ast)
      } else
        ast match {
          case ASTComment(s) ⇒ 0
          case _             ⇒ 1
        }
    } else {
      0
    })).*(EOL) ~ EOL.? ~ EOI ~> ((v: Seq[Int]) ⇒ {
    v.sum
  })
}

The render routine is passed as the second argument to the NTriplesParser routine

class NTriplesParser(val input: ParserInput, val renderStatement: (NTripleAST) ⇒ Int, ...

The renderStatement

val output = new StringWriter()

/* initialize output */
JSONLDWriterInit(output)()

/* AST evaluation procedure. Here is the point to provide your own flavour, if you like. */
val evalNT = new EvalNT(JSONLDWriter(output)_, base, label)

val parser = NTriplesParser(input, evalNT.renderStatement, validate, base, label)

val res = parser.ntriplesDoc.run()

/* finalize output */
jsonWriterTrailer(output)()

class EvalNT(output: (NTripleElement, NTripleElement, NTripleElement) ⇒ Int, basePath: String, label: String) extends NTReturnValue {

Each NTripleBitValue.scala has some definitions

trait JSONLDOutput extends RDFReturnType {
  /* Simple JSON output for NTriples */
  val sb: StringBuilder = new StringBuilder()

  def jsonWriterInit(bo: Writer)(s: String = "[\n"): Unit = {
    bo.write(s)
    if ( sb.nonEmpty) {
      sb.clear()
    }
  }

  def jsonWriter(bo: Writer)(s: NTripleElement, p: NTripleElement, o: NTripleElement): Int = {

    if ( sb.nonEmpty) {
      sb.append(",\n")
      bo.write(sb.toString())
      sb.clear()
    }

    //[3]	subject	::=	IRIREF | BLANK_NODE_LABEL
    if ( NTripleBitValue.isIRIREF(s.tokenType) ) {
      sb.append("  {\n" + """  "@id": """" + s.text.substring(1, s.text.length-1) + """"""" + ",\n")
    }
    else if ( NTripleBitValue.isBLANK_NODE_LABEL(s.tokenType)) {
      sb.append("  {\n" + """  "@id": """" + s.text + """"""" + ",\n")
    }

    //[4]	predicate	::=	IRIREF
    if( NTripleBitValue.isIRIREF(p.tokenType) ) {
      sb.append("""  """" + p.text.substring(1, p.text.length-1) + """"""" + ":\n")
    }

    //[5]	object	::=	IRIREF | BLANK_NODE_LABEL | STRING_LITERAL_QUOTE ('^^' IRIREF | LANGTAG)?
    if ( NTripleBitValue.isIRIREF(o.tokenType) && !NTripleBitValue.isSTRING_LITERAL_QUOTE(o.tokenType)) {
      sb.append("""    { "@id": """" + o.text.substring(1, o.text.length-1) + """"""" + "}\n  }")
    }
    else if ( NTripleBitValue.isBLANK_NODE_LABEL(o.tokenType)) {
      sb.append("""    { "@id": """" + o.text + """"""" + " }\n  }")
    }
    else if ( NTripleBitValue.isSTRING_LITERAL_QUOTE(o.tokenType) ) {
       if ( NTripleBitValue.isLANGTAG(o.tokenType)) {
         val parts = o.text.split("@")
         sb.append("    {\n    " + """"@type": """" + parts(1) + """"""" + ",\n")
         sb.append("""    "@value": """ + parts(0) + "\n    }\n  }")
       } else if ( NTripleBitValue.isLITERALTAG(o.tokenType)) {
         val parts = o.text.split("\\^\\^")
         sb.append("    {\n    " + """"@type": """" + parts(1).substring(1, parts(1).length-1) + """"""" + ",\n")
         sb.append("""    "@value": """ + parts(0) + "\n    }\n  }")
       }
       else {
         sb.append("""    { "@value": """ + o.text + " }\n  }")
       }
    }
    1
  }

  def jsonWriterTrailer(bo: Writer)(s: String="\n]"): Unit = {
    if ( sb.nonEmpty) {
      bo.write(sb.toString())
      sb.clear()
    }
    bo.write(s)
  }
}

Postscript

Questions, support, cooperation or collaboration are explicitely welcome!

Licence

Cheló̱na is released under the Apache License 2.0