This project has retired. For details please refer to its Attic page.
Tutorial 03 - Convert Serialized Graph Representation

Tutorial 03 - Convert Serialized Graph Representation

Author: Hasan (hasan@apache.org)
Last update: February 19, 2018

This tutorial aims at showing how we can use a parser and a serializer to convert a graph representation from one format to another format.

Problem Definition

Given a file containing a set of triples in Turtle serialization format (text/turtle), another file is to be created containing the same triples in RDF/XML serialization format. Assuming the content of the turtle file is as follows:

@prefix ex: <http://clerezza.apache.org/2017/01/example#> . _:a ex:hasFirstName "Hasan" . _:a ex:isA ex:ClerezzaUser .

the program should create a new file /tmp/example03.rdf containing the triples in RDF/XML format.

Solution

As shown in Tutorial 02, Apache Clerezza provides a Parser that can be used to read files containing triples in various serialization format. In this tutorial, we will also use a ParsingProvider based on Jena Parser to parse the turtle file into a Simple Graph. Afterwards the graph will be serialized using the Apache Clerezza Serializer. The Serializer makes use of SerializingProvider services which implement the functionality to serialized graphs to files of specific data format. We are going to use a SerializingProvider based on Jena Serializer.

The programme listed below reads the file example03.ttl, parses its content into a Graph, and serializes the Graph to /tmp/example03.rdf.

1 package org.apache.clerezza.tutorial; 2 3 import org.apache.clerezza.commons.rdf.Graph; 4 import org.apache.clerezza.rdf.core.serializedform.Parser; 5 import org.apache.clerezza.rdf.core.serializedform.Serializer; 6 import org.apache.clerezza.rdf.core.serializedform.SupportedFormat; 7 import org.apache.clerezza.rdf.core.serializedform.UnsupportedFormatException; 8 import org.slf4j.Logger; 9 import org.slf4j.LoggerFactory; 10 11 import java.io.FileNotFoundException; 12 import java.io.FileOutputStream; 13 import java.io.InputStream; 14 15 public class Example03 { 16 17 private static final Logger logger = LoggerFactory.getLogger(Example03.class); 18 19 public static void main(String[] args) { 20 InputStream inputStream = Example03.class.getResourceAsStream("example03.ttl"); 21 Parser parser = Parser.getInstance(); 22 23 Graph graph; 24 try { 25 graph = parser.parse(inputStream, SupportedFormat.TURTLE); 26 } catch (UnsupportedFormatException ex) { 27 logger.warn(String.format("%s is not supported by the used parser", SupportedFormat.TURTLE)); 28 return; 29 } 30 31 Serializer serializer = Serializer.getInstance(); 32 try { 33 FileOutputStream outputStream = new FileOutputStream("/tmp/example03.rdf"); 34 serializer.serialize(outputStream, graph, SupportedFormat.RDF_XML); 35 } catch (FileNotFoundException ex) { 36 logger.warn(ex.getMessage()); 37 } catch (UnsupportedFormatException ex) { 38 logger.warn(String.format("%s is not supported by the used serializer", SupportedFormat.RDF_XML)); 39 } 40 } 41 }

We will use maven for building the program. The required POM file is as follows:

1 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 2 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> 3 <modelVersion>4.0.0</modelVersion> 4 <groupId>org.apache.clerezza.tutorial</groupId> 5 <artifactId>Example-03</artifactId> 6 <packaging>jar</packaging> 7 <version>1.0</version> 8 <build> 9 <plugins> 10 <plugin> 11 <groupId>org.apache.maven.plugins</groupId> 12 <artifactId>maven-compiler-plugin</artifactId> 13 <version>3.7.0</version> 14 <configuration> 15 <source>1.8</source> 16 <target>1.8</target> 17 </configuration> 18 </plugin> 19 <plugin> 20 <groupId>org.codehaus.mojo</groupId> 21 <artifactId>exec-maven-plugin</artifactId> 22 <version>1.6.0</version> 23 <executions> 24 <execution> 25 <goals> 26 <goal>java</goal> 27 </goals> 28 </execution> 29 </executions> 30 <configuration> 31 <mainClass>org.apache.clerezza.tutorial.Example03</mainClass> 32 </configuration> 33 </plugin> 34 </plugins> 35 </build> 36 <name>Example-03</name> 37 <url>http://maven.apache.org</url> 38 <dependencies> 39 <dependency> 40 <groupId>org.apache.clerezza</groupId> 41 <artifactId>rdf.core</artifactId> 42 <version>1.0.1</version> 43 </dependency> 44 <dependency> 45 <groupId>org.slf4j</groupId> 46 <artifactId>slf4j-simple</artifactId> 47 <version>1.7.25</version> 48 </dependency> 49 <dependency> 50 <groupId>org.apache.clerezza</groupId> 51 <artifactId>rdf.jena.parser</artifactId> 52 <version>1.1.1</version> 53 </dependency> 54 <dependency> 55 <groupId>org.apache.clerezza</groupId> 56 <artifactId>rdf.jena.serializer</artifactId> 57 <version>1.1.1</version> 58 </dependency> 59 </dependencies> 60 </project>

The directory structure is simple as shown below:

pom.xml src/main/java/org/apache/clerezza/tutorial/Example03.java src/main/resources/org/apache/clerezza/tutorial/example03.ttl

To build the jar, we should invoke:

mvn package

Running the programme can be done by invoking

mvn exec:java

The result of the programme execution is the file /tmp/example03.rdf, which contains the serialized graph as follows:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:j.0="http://clerezza.apache.org/2017/01/example#" > <rdf:Description rdf:nodeID="A0"> <j.0:isA rdf:resource="http://clerezza.apache.org/2017/01/example#ClerezzaUser"/> <j.0:hasFirstName rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Hasan</j.0:hasFirstName> </rdf:Description> </rdf:RDF>

Discussion

The maven POM file shows four libraries on which the programme directly depends:

  • org.apache.clerezza.rdf.core: contains implementation of the Apache Clerezza Parser
  • org.apache.clerezza.rdf.jena.parser: contains ParsingProvider service based on Jena Parser
  • org.apache.clerezza.rdf.jena.serializer: contains SerializingProvider service based on Jena Serializer
  • org.slf4j.slf4j-simple: contains implementation of the logger

The core of the programme lies at line 25 (parsing a stream of triples into a graph) and 34 (serializing a graph to a file).

Note: Any comments and suggestions for improvements are welcome. Please send your feedback to dev@clerezza.apache.org