Tutorial 02 - Import Triples from a File into a Simple Graph

Author: Hasan (hasan@apache.org)
Last update: November 11, 2017

In this tutorial we are going to import triples stored in a file into a graph.

Problem Definition

Given a file containing a set of triples in Turtle serialization format (text/turtle), an RDF Graph should be created and filled with the triples. Assuming the content of the file is as follows, the program should log the corresponding triples.

@prefix ex: <http://clerezza.apache.org/2017/01/example#> . _:a ex:hasFirstName "Hasan" . _:a ex:isA ex:ClerezzaUser .

Solution

Apache Clerezza provides a Parser that can be used to read files containing triples in various serialization format. The Parser makes use of ParsingProvider services which implement the functionality to parse files of specific data format. We are going to use a ParsingProvider based on Jena Parser.

The programme listed below reads the file example02.ttl, parses its content and stores the triples into a Graph. Then it reads the newly created graph and logs the triples within the graph.

1 package org.apache.clerezza.tutorial; 2 3 import org.apache.clerezza.commons.rdf.Graph; 4 import org.apache.clerezza.commons.rdf.Triple; 5 import org.apache.clerezza.rdf.core.serializedform.Parser; 6 import org.apache.clerezza.rdf.core.serializedform.SupportedFormat; 7 import org.apache.clerezza.rdf.core.serializedform.UnsupportedFormatException; 8 import org.slf4j.Logger; 9 import org.slf4j.LoggerFactory; 10 11 import java.io.InputStream; 12 import java.util.Iterator; 13 14 public class Example02 { 15 16 private static final Logger logger = LoggerFactory.getLogger(Example02.class); 17 18 public static void main(String[] args) { 19 InputStream inputStream = Example02.class.getResourceAsStream("example02.ttl"); 20 Parser parser = Parser.getInstance(); 21 22 try { 23 Graph graph = parser.parse(inputStream, SupportedFormat.TURTLE); 24 25 Iterator<Triple> iterator = graph.filter(null,null,null); 26 Triple triple; 27 28 while (iterator.hasNext()) { 29 triple = iterator.next(); 30 logger.info(String.format("%s %s %s", 31 triple.getSubject().toString(), 32 triple.getPredicate().toString(), 33 triple.getObject().toString() 34 )); 35 } 36 } catch (UnsupportedFormatException ex) { 37 logger.warn(String.format("%s is not supported by the used parser", SupportedFormat.TURTLE)); 38 } 39 } 40 }

We will use maven for building the program. The required POM file is as follows:

1 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 2 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> 3 <modelVersion>4.0.0</modelVersion> 4 <groupId>org.apache.clerezza.tutorial</groupId> 5 <artifactId>Example-02</artifactId> 6 <packaging>jar</packaging> 7 <version>1.0</version> 8 <build> 9 <plugins> 10 <plugin> 11 <groupId>org.apache.maven.plugins</groupId> 12 <artifactId>maven-compiler-plugin</artifactId> 13 <version>3.7.0</version> 14 <configuration> 15 <source>1.8</source> 16 <target>1.8</target> 17 </configuration> 18 </plugin> 19 <plugin> 20 <groupId>org.codehaus.mojo</groupId> 21 <artifactId>exec-maven-plugin</artifactId> 22 <version>1.6.0</version> 23 <executions> 24 <execution> 25 <goals> 26 <goal>java</goal> 27 </goals> 28 </execution> 29 </executions> 30 <configuration> 31 <mainClass>org.apache.clerezza.tutorial.Example02</mainClass> 32 </configuration> 33 </plugin> 34 </plugins> 35 </build> 36 <name>Example-02</name> 37 <url>http://maven.apache.org</url> 38 <dependencies> 39 <dependency> 40 <groupId>org.apache.clerezza</groupId> 41 <artifactId>rdf.core</artifactId> 42 <version>1.0.1</version> 43 </dependency> 44 <dependency> 45 <groupId>org.slf4j</groupId> 46 <artifactId>slf4j-simple</artifactId> 47 <version>1.7.25</version> 48 </dependency> 49 <dependency> 50 <groupId>org.apache.clerezza</groupId> 51 <artifactId>rdf.jena.parser</artifactId> 52 <version>1.1.1</version> 53 </dependency> 54 </dependencies> 55 </project>

The directory structure is simple as shown below:

pom.xml src/main/java/org/apache/clerezza/tutorial/Example02.java src/main/resources/org/apache/clerezza/tutorial/example02.ttl

To build the jar, we should invoke:

mvn package

Running the programme can be done by invoking

mvn exec:java

The result of the programme execution shows the log messages as expected.

[INFO] Scanning for projects... [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Example-02 1.0 [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ Example-02 --- SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hasan/.m2/repository/org/slf4j/slf4j-simple/1.7.25/slf4j-simple-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hasan/.m2/repository/org/slf4j/slf4j-log4j12/1.7.6/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.SimpleLoggerFactory] [org.apache.clerezza.tutorial.Example02.main()] INFO org.apache.clerezza.rdf.core.serializedform.Parser - constructing Parser [org.apache.clerezza.tutorial.Example02.main()] INFO org.apache.clerezza.tutorial.Example02 - org.apache.clerezza.rdf.jena.commons.JenaBNodeWrapper@78d47560 <http://clerezza.apache.org/2017/01/example#hasFirstName> "Hasan"^^<http://www.w3.org/2001/XMLSchema#string> [org.apache.clerezza.tutorial.Example02.main()] INFO org.apache.clerezza.tutorial.Example02 - org.apache.clerezza.rdf.jena.commons.JenaBNodeWrapper@78d47560 <http://clerezza.apache.org/2017/01/example#isA> <http://clerezza.apache.org/2017/01/example#ClerezzaUser> [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.928 s [INFO] Finished at: 2017-11-11T15:53:32+01:00 [INFO] Final Memory: 9M/216M [INFO] ------------------------------------------------------------------------

Discussion

The maven POM file shows three libraries on which the programme directly depends:

  • org.apache.clerezza.rdf.core: contains implementation of the Apache Clerezza Parser
  • org.apache.clerezza.rdf.jena.parser: contains ParsingProvider service based on Jena Parser
  • org.slf4j.slf4j-simple: contains implementation of the logger

The core of the programme lies at line 20 (Parser instantiation) and 23 (parsing a stream of triples into a graph).

Note: Any comments and suggestions for improvements are welcome. Please send your feedback to dev@clerezza.apache.org