====== Resource Description Framework (RDF) in use ====== ===== Before the lab ===== Reading: * [[http://www.w3.org/2004/Talks/17Dec-sparql/intro/all.html | Introduction To RDF Query With SPARQL ]] * [[http://www.w3.org/TR/rdf-sparql-query/| Documentation from w3.org ]] * [[pl:miw:miw2009_llvm#prezentacja|Presentation on SPARQL (ppt)]] ({{:pl:dydaktyka:semantic_web:sparql.pdf|pdf}}) * [[#if_you_want_to_know_more|If you want to know more...]] Software: * [[http://www.ldodds.com/projects/twinkle/|Twinkle]] ===== Introduction ===== Lecture: * {{:pl:dydaktyka:semantic_web:geist-semweb-rdf-annotated.pdf|Semantic Web: 3a - RDF}} * {{:pl:dydaktyka:semantic_web:geist-semweb-rdfsutils-shrinked.pdf|Semantic Web: 3b - RDF/S in use}} * {{:pl:dydaktyka:semantic_web:geist-semweb-rdfs.pdf|Semantic Web: 3c - RDF Schema}} ===== Lab instructions ===== In this lab you will test various scenarios of using RDF/S data. You will navigate through, query, store and manipulate it from within a Java application. The aim of this lab is to give you a broad perspective on using RDF/S data, without exhaustively going into details. We hope you will be encouraged for independent research and tests. **The lab is divided into sections. You should allow aprox. 10 minutes for each section.** ==== - SPARQL - demo ==== - Go to the [[http://hyperdata.org/sparql/demo/| sparql demo page]] - analyze and run existing examples. - answer the question: What semantic vocabularies are used in the queries? What are they for? - Go to [[http://www.sparql.org/query.html|SPARQLer]] - check the ''Force the accept header to text/plain regardless '' - test available result formats for example queries (JSON output / text output / CSV output / TSV output) - run the ''Construct'' query and analyze the result ==== - SPARQL queries - basics==== - In this exercise we will use [[http://www.ldodds.com/projects/twinkle/| Twinkle ]] ([[http://www.flickr.com/photos/ldodds/tags/twinkle/|screenshots]]) - Run [[http://www.ldodds.com/projects/twinkle/| Twinkle ]] \\ //On Charon: Open Terminal and type:// $ twinkle - Test the examples for ''PeriodicTable'' and ''PlanetFeed'' (Choose ''File -> Open -> examples/...'') - Execute queries on your foaf file to retrieve : * friends who have name and e-mail defined * friends who have name and e-mail defined and optional homepage * friends who have name and e-mail defined and optional homepage, sorted by name descending **Hint:** * You can use the demos from the previous task for reference or this [[http://hyperdata.org/sparql/demo/sparql-editor.html|SPARQL editor]]. * Type in the location of your file in the ''Data URL'' field (web URL or local path) OR use ''FROM'' construct to define your data source, e.g. PREFIX rdf: PREFIX foaf: SELECT DISTINCT ?name FROM WHERE { ?x rdf:type foaf:Person . ?x foaf:name ?name } LIMIT 10 ==== - SPARQL queries - options ==== In this exercise use the [[.new-semweb-rdf1#foaf_files_of_piw_2011|FOAF files of your friends]]. Run [[http://www.ldodds.com/projects/twinkle/| Twinkle ]] and execute queries on chosen foaf file to retrive : * people whose name starts with 'K' * people who are older than 18 years old * people whose name starts with 'K' or are older than 18 years old, make search caseinsensitive * people having e-mails on student.agh.edu.pl server * name of people, who have homepage or e-mail on student.agh.edu.pl server **Hints:** * To view the semantic information and conveniently visualize RDF files install the [[http://dig.csail.mit.edu/2007/tab/|Tabulator]] extension to your Web browser. ==== - Open Data Sets ==== DBPedia - Scan the [[http://dbpedia.org/About|DBPedia]] introduction identifying the project goals and applications - Test and analyze the queries to the DBPedia from the following websites: * [[http://wiki.dbpedia.org/Datasets|Datasets]] * [[http://wiki.dbpedia.org/OnlineAccess#h28-5|OnlineAccess]] - Test various SPARQL clients listed [[http://wiki.dbpedia.org/OnlineAccess#h28-14|on the DBPedia]] website. MusicBrainz * http://dbtune.org/musicbrainz/ * Try to construct queries using the SPARQL endpoint Answer the question: What are the main limitations of using (querying for information) the RDF datasets such as DBPedia or MusicBrainz? || I think of: not knowing what properties to use in the query. ==== - Introduction to Jena ==== [[http://jena.sourceforge.net/|Jena]] is a Java framework for building Semantic Web applications. It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine. * Useful link: [[http://jena.sourceforge.net/tutorial/RDF_API/index.html|Jena documentation]]. Basic information (from the [[http://jena.sourceforge.net/tutorial/RDF_API/index.html|Introduction to Jena]]): - Jena adopts the RDF triple model. It is a Java API which can be used to create and manipulate RDF graphs (see the example picture about ''JohnSmith''). {{ http://jena.sourceforge.net/tutorial/RDF_API/figures/fig2.png|example graph}}. Jena has object classes to represent graphs, resources, properties and literals. The interfaces representing resources, properties and literals are called Resource, Property and Literal respectively. In Jena, a graph is called a model and is represented by the Model interface. - **Building the model** * Creating semantic models is provided by the Jena's ModelFactory class. (e.g. ''Model model = ModelFactory.createDefaultModel();''). Jena contains other implementations of the Model interface, e.g one which uses a relational database: these types of Model are also available from ModelFactory. * Model stores Resources. After the resources are created (e.g. ''Resource johnSmith = model.createResource(personURI);''), statements can be made about them and added the model (e.g. ''johnSmith.addProperty(VCARD.FN, fullName);''). in the example below the property is provided by a "constant" class VCARD which holds objects representing all the definitions in the VCARD schema. Jena provides constant classes for other well known schemas, such as RDF and RDF schema themselves, Dublin Core and DAML. * The //object// in the triple may be a literal value or a resource. Note that the ''vcard:N'' property takes a //resource// as its value. Note also that the ellipse representing the compound name has no URI. It is known as an //blank Node//. // some definitions String personURI = "http://somewhere/JohnSmith"; String givenName = "John"; String familyName = "Smith"; String fullName = givenName + " " + familyName; // create an empty Model Model model = ModelFactory.createDefaultModel(); // create the resource // and add the properties cascading style Resource johnSmith = model.createResource(personURI) .addProperty(VCARD.FN, fullName) .addProperty(VCARD.N, model.createResource() .addProperty(VCARD.Given, givenName) .addProperty(VCARD.Family, familyName)); - **Listing the Statements in the Model** * An RDF Model is represented as a //set// of statements. The Jena model interface defines a ''listStatements()'' method which returns an ''StmtIterator'', a subtype of Java's Iterator over all the statements in a Model. ''StmtIterator'' has a method ''nextStatement()'' which returns the next statement from the iterator. The ''Statement'' interface provides accessor methods to the //subject//, //predicate// and //object// of a statement. // list the statements in the Model StmtIterator iter = model.listStatements(); // print out the predicate, subject and object of each statement while (iter.hasNext()) { Statement stmt = iter.nextStatement(); // get next statement Resource subject = stmt.getSubject(); // get the subject Property predicate = stmt.getPredicate(); // get the predicate RDFNode object = stmt.getObject(); // get the object System.out.print(subject.toString()); System.out.print(" " + predicate.toString() + " "); if (object instanceof Resource) { System.out.print(object.toString()); } else { // object is a literal System.out.print(" \"" + object.toString() + "\""); } System.out.println(" ."); } - **Reading/writing the Model** * Jena has methods for reading and writing RDF as XML. These can be used to save an RDF model to a file and later read it back in again. * By default the Model is stored in RDF/XML syntax ''model.write(System.out);''). However, other syntaxes e.g. N-Triple are supported. * Reading from file (in the example below we have an ''inputFileName'' variable defined, as well as ''defaultNameSpace'', one can you ''null'' instead of the namespace in the ''read'' method): try { InputStream myFile = FileManager.get().open( inputFileName ); model.read(myFile,defaultNameSpace); meFile.close(); } catch (IOException io){ System.out.println("File Error: " + io.getMessage()); } * Reading from URI (here a namespace is defined as the URI that we use): model.read("http://home.agh.edu.pl/wta/foaf.rdf", "http://home.agh.edu.pl/wta/foaf.rdf#",null); TASK: - Download [[http://jena.sourceforge.net/downloads.html|Jena]] and setup a project (on charon there are ''eclipse'' and ''netbeans'' available). - Compile and run a sample program (e.g. given above). ==== - Reading, Writing, Navigating the model in Jena ==== - Using Jena write a simple translator that will read in an RDF file in one syntax (e.g. RDF/XML) and return the model serialized in another way (e.g. TURTLE). - Read in your FOAF file and list all the Statements present in the Model. - Read a FOAF file form URI and print all the information. Format it neatly. ;-) ==== - Using SPARQL with Jena - command line ==== Jena supports SPARQL via a dedicated module ARQ. In addition to implementing SPARQL, ARQ's query engine can also parse queries expressed in RDQL or its own internal query language. Currently ARQ is a part of a standar Jena distribution. provides two basic ways of posing SPARQL queries to RDF graphs. - SPARQL queries from the command line * To get the ARQ engine to work, you have to set the environmetn variable ''$ARQROOT'' to point to the ARQ directory (the downloaded and unpacked Jena directory). Optionaly you may have to modify the access rights for the files in ''$ARQROOT/bin/'' directory and add the ''bin'' directory to the execution path. $ export ARQROOT=~/your_jena_dir $ chmod +rx $ARQROOT/bin/* $ export PATH=$PATH:$ARQROOT/bin $ sparql Usage: [--data URL] [exprString | --query file] * Create a text file (e.g. ''q1.rq'') and construct a SPARQL query in it, e.g.(and this is just an example, be creative, men ;-) ) PREFIX foaf: SELECT ?url FROM WHERE { ?p foaf:name ?n . ?p foaf:homepage ?url . } * Run the query from the command line, e.g. ''sparql --query q1.rq'' ==== - Using SPARQL with Jena - java application ==== Using SPARQL queries within the Java application. Read the [[http://www.ibm.com/developerworks/xml/library/j-sparql/|Search RDF data with SPARQL]] tutorial. - Create your Model by reading in [[.new-semweb-rdf1#foaf_files_of_piw_2011|several FOAF files]]. - Modify the queries so that they will use the information from your model - Create an ''ASK'' query for your model: a SPARQL query that returns either "yes" or "no", e.g. "Does Weronika T. Adrian (WTA) know Krzysztof Kluza (KKL)?": public void queryForExistance() { String queryString = "PREFIX foaf: " + "ASK { ?x foaf:name \"Weronika T. Adrian (FurmaƄska)\" ; " + "foaf:knows ?y . " + "?y foaf:name \"Krzysztof Kluza\" . }"; Query query = QueryFactory.create(queryString) ; QueryExecution qexec = QueryExecutionFactory.create(query, model); // Now run the query. ASK queries only return a boolean value. Boolean answer = qexec.execAsk(); // Output the result System.out.println(); System.out.println("Does WTA know KKL?"); if (answer) System.out.println("yes"); else System.out.println("no"); } ===== If you want to know more ===== Tools: * http://docs.openlinksw.com/virtuoso/ * http://querybuilder.dbpedia.org/ Open Data Sets: * http://news.ycombinator.com/item?id=1493768 DB2RDF (RDF and Relational Databases): * http://esw.w3.org/RdfAndSql * http://sourceforge.net/apps/mediawiki/bio2rdf/index.php?title=Main_Page * http://esw.w3.org/ConverterToRdf