Spis treści

Querying the Semantic Web with SPARQL

Querying the Semantic Web with SPARQL

Last verification:	20180914
Tools required for this lab:	–

Before the lab

Video [minimum!]:

SPARQL in 11 minutes

Reading:

Lab instructions

During this lab we will use two services to execute SPARQL queries:

SPARQLer (a general purpose SPARQL query processor) will be used for querying RDF files.
YASGUI (Yet Another Sparql GUI) will be used for querying SPARQL Endpoints (it has more powerful editor, but it can't be used against simple RDF files )

1 Introduction [5 minutes]

What can we do with our RDF models? In this section some „magic” will happen on Periodic Table saved in RDF!
Open SPARQLer.
Paste http://www.daml.org/2003/01/periodictable/PeriodicTable.owl into „Target graph URI (or use FROM in the query)” field and select text output option.
- There is also a backup (if original URI cannot be resolved): http://krzysztof.kutt.pl/didactics/semweb/PeriodicTable.owl

Run the following two queries (paste code in text field and click Get Results):

select.rq

PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
 
SELECT ?element ?name
WHERE {
  ?element table:group ?group .
  ?group table:name "Noble gas"^^xsd:string .
  ?element table:name ?name .
}

construct.rq

PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://w3.org/2000/01/rdf-schema#>
 
CONSTRUCT {
  ?element rdfs:label ?name .
}
WHERE {
  ?element table:group ?group .
  ?group table:name "Noble gas"^^xsd:string .
  ?element table:name ?name .
}

Both queries run on the same dataset
Both queries extract the same data: list of all elements in Noble gases group with their names
Analyze queries and results: how they differ?

What do SELECT queries do?
What do CONSTRUCT queries do?

2 SPARQL = Pattern matching [10 minutes]

General Idea: SPARQL is an RDF graph pattern matching system.
E.g.: there is a triple saved in RDF:
```
:Hydrogen :standardState :gas .
```
Now we can simply replace part of the triple with a question word (with a question mark at the start) and we get simple queries, e.g.:
- Query: :Hydrogen :standardState what? .
  Answer: :gas
- Query: ?what :standardState :gas .
  Answer: :Hydrogen
- Query: :Hydrogen ?what :gas .
  Answer: :standardState

Now, let's do some more queries against Periodic Table. Prepare the following ones:
- elements which have name and symbol defined
- elements which have name and symbol defined and are placed in period_7 period
- elements which have name and symbol defined and are placed in period_7 period and have OPTIONAL color (some of them does not have color!)
- elements which have name and symbol defined and are placed in period_7 period and have OPTIONAL color, sorted by name descending
Put the constructed queries in the report.
- Hints:
  - SPARQL 1.1 documentation may be useful for specifying optional values
  - FOAF Vocabulary Specification

3 Constraints: FILTER [10 minutes]

After matching RDF graph pattern, there is also possibility to put some constraints on the rows that will be excluded or included in the results. This is achieved using FILTER construct. Let's try it now on the Periodic Table.
Prepare and execute queries to retrive:
- elements which name starts with 's'
- elements which has digit '2' in atomicNumber
- elements which name starts with 's' or which has digit '2' in atomicNumber, make search caseinsensitive
Hints:
- SPARQL 1.1 Documentation parts about constraints and alternatives may be useful
Put the queries in the report.

4 SPARQL Endpoint [20 minutes]

SPARQL queries may be asked against RDF file as we did in previous sections. But there is also possibility to use special purpose web service called SPARQL Endpoint. It wraps some data set and provides a service that responds to the SPARQL protocol, providing access to the data set.
Many SPARQL Endpoints are available today, providing information about a variety of subjects. In this section we will use DBpedia SPARQL Endpoint at http://dbpedia.org/sparql.

DBpedia is a dump of Wikipedia annotated using RDF. So, like Wikipedia, DBpedia should contain some information about Poland. What we can do?
We don't know what URI Poland has in DBpedia, but we know the name Poland, and from previous lab we know rdfs:label property. Maybe this will help us? Let's try!
Open the YASGUI.
What we know so far? There should be some URI (?country) that probably has a relation rdfs:label with object „Polska”@pl. This can be easily translated into SPARQL WHERE clause:
```
?country rdfs:label "Polska"@pl .
```
To execute this query properly, enter the http://dbpedia.org/sparql URI in the dropdown list at the top.

Then, specify the query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?country
WHERE { 
    ?country rdfs:label "Polska"@pl .
}

Success! There is something that has rdfs:label „Polska”@pl!
Now expand this query to find information about Poland population and put the final query in the report.
- Hint: result should look like this:
```
--------------
| population |
==============
| 38483957   |
--------------
```
Prepare a query that returns a list of 10 countries in Europe with the biggest population. Put the query in the report.

5 Aggregation [15 minutes]

SPARQL provides grouping and aggregation mechanisms known from SQL:
- grouping: GROUP BY
- aggregation: COUNT, SUM, MIN, MAX, AVG, GROUP_CONCAT, and SAMPLE
- filter on groups: HAVING
- See SPARQL 1.1 documentation for wider description.

Poland is divided into 16 voivodeships (PL: województwo), and then into 380 counties (PL: powiat). In this task, we will examine it closer.
Prepare a query (in YASGUI, against DBpedia) which returns list of voivodeships and number of counties inside them. List should consist only of voivodeships with 7 or more counties and should be ordered by number of counties.

Results should look like that:

------------------------------------------------
| voivodeship                       | counties |
================================================
| "Masovian Voivodeship"@en         | 15       |
| "Greater Poland Voivodeship"@en   | 12       |
| "Lesser Poland Voivodeship"@en    | 11       |
| "Podkarpackie Voivodeship"@en     | 10       |
| "Pomeranian Voivodeship"@en       | 9        |
| "Warmian-Masurian Voivodeship"@en | 9        |
| "West Pomeranian Voivodeship"@en  | 9        |
| "Opole Voivodeship"@en            | 8        |
------------------------------------------------

or in Polish:

--------------------------------------------------
| wojewodztwo                          | powiaty |
==================================================
| "Województwo mazowieckie"@pl         | 15      |
| "Województwo wielkopolskie"@pl       | 12      |
| "Województwo małopolskie"@pl         | 11      |
| "Województwo podkarpackie"@pl        | 10      |
| "Województwo pomorskie"@pl           | 9       |
| "Województwo warmińsko-mazurskie"@pl | 9       |
| "Województwo zachodniopomorskie"@pl  | 9       |
| "Województwo opolskie"@pl            | 8       |
--------------------------------------------------

Hint – useful URIs:
- county: http://dbpedia.org/resource/Powiat
- voivodeship: http://dbpedia.org/resource/Voivodeships_of_Poland
Put the query in the report.

6 SPARQL as rule language [10 minutes]

So far, we have seen that the answers to questions in SPARQL can take the form of a table. In this section we will take a look at CONSTRUCT queries which answers take the form of an RDF graph. You have already seen one such example in Introduction.

CONSTRUCT queries provides a way to introduce „rules” into RDF datasets:
1. Let's back to The Bold and the Beautiful/The Game of Thrones model you prepared previously. Probably you had a problem which relations should be placed in RDF file: is_father_of or is_child_of or maybe both of them?
2. CONSTRUCT queries make this simpler. In the initial data set you can put one of them, let's assume it was is_father_of. Now, you can execute CONSTRUCT query that creates inverse relation:
```
PREFIX bb: <http://yourname/b-and-b#>.

CONSTRUCT {
  ?child bb:is_child_of ?father .
}
WHERE {
  ?father bb:is_father_of ?child
}
```
3. Or maybe is_uncle_of relation will be useful? No problem!
```
PREFIX bb: <http://yourname/b-and-b#>.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

CONSTRUCT {
  ?uncle bb:is_uncle_of ?child .
}
WHERE {
  ?uncle bb:is_sibling_of ?parent;
         a bb:Man.
  ?child bb:is_child_of ?parent
}
```
4. You don't have is_sibling_of but instead you have is_sister_of and is_brother_of. Simply prepare query (or queries) that creates is_sibling_of for you.
  - Put this query in the report.
  - Note: this query have to be executed in SPARQLer (not in YASGUI)
  - Note 2: your dataset has to be available online as a RAW file, e.g. if you has the file on Dropbox, you have to change the www.dropbox.com part of the shared link to dl.dropboxusercontent.com

OK, we created some new RDF triples using CONSTRUCT query. What now? Depending on your plans, you can:
- Add these triples back to the original dataset,
- Create new dataset (e.g. save results in RDF file).
And then simply execute queries against this new knowledge.

What 3 rules you will find useful in your model from previous lab. Put 3 CONSTRUCT queries in the report.

7 ASK and DESCRIBE queries [10 minutes]

SPARQL also provides two more query types: ASK and DESCRIBE.

ASK queries simply provide Yes/No answer and no information about founded triples (in case of „Yes” answer).
- E.g. Is there anything with name „aluminium” in this data set?
```
PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

ASK {
  ?element table:name "aluminium"^^xsd:string .
}
```
  If you run this query against Periodic Table, answer will be yes.
- Prepare query that checks something interesting in your model :)
  - If you have no idea what you can check, you can simply prepare a query that checks if there is anything that is a MusicCD and was published by Warner Music Group (if you don't have such classes in your library, use analogous class that you have).

DESCRIBE queries return all knowledge associated with given Subject URI(s).
- The simplest DESCRIBE query specifies only the URI that should be described:
```
DESCRIBE <http://www.daml.org/2003/01/periodictable/PeriodicTable#H>
```
  (it should be executed against http://www.daml.org/2003/01/periodictable/PeriodicTable.owl file)
- There is also possibility to select URI(s) from data set using constraints defined in WHERE clause. Read about it in SPARQL 1.1 documentation.
- Prepare query that describes all foaf:Person items from your model (if you don't have a foaf:Person class in your library, use analogous class that you have).

8 "Negation" under Open World Assumption [5 minutes]

RDF vs SQL:
- RDF: Open World Assumption
- SQL: Closed World Assumption

Let's imagine that we are preparing query about all the living actors who played in Star Wars Episode VI: Return of the Jedi.

Scheme of this query in RDF:

SELECT ?actor
WHERE {?actor :playedIn :ReturnOfTheJedi .
       NOT EXISTS {?actor :diedOn ?deathdate . }
}

Idea scheme of this query in SQL:

SELECT actor_name
FROM movies
WHERE title = "Return of the Jedi"
AND NOT EXISTS (SELECT *
                FROM deaths
                WHERE movies.actor_name = deaths.name);

These queries look like the same but they are different!
- What are Open World Assumption (OWA) and Close World Assumption (CWA)?
- What is the difference between these two queries (refer to the knowledge of OWA and CWA)?

9 Wikipedia, DBpedia, Wikidata

If you are interested in querying the huge amount of data available in Wikipedia, there are two projects you may be interested in:

DBpedia – an attempt to extract data from Wikipedia infoboxes and links (using developed parsers)
Wikidata – an attempt to create an RDF base from scratch by the community (using provided GUI)

They overlap in part, but are independent of each other and have different uses. For you, a student of the Semantic Web Technologies course, it does not matter much. They are simply large knowledge bases with which you can do a lot of things.
If you want to dive into this data you can start with a Big set of SPARQL queries against Wikidata.

Control questions

How we create the SPARQL queries?
What are the four SPARQL query types and how they differ? What is the form of the result in these queries?
What is SPARQL Endpoint?

If you want to know more...

SPARQL:

SPARQL 1.1 Query Language
SPARQL 1.1 Overview
Learn SPARQL @Cambridge Semantics
You can combine results from many SPARQL Endpoints in one query – see SPARQL Federated Query for more information.

Sample queries in SPARQL:

Tools:

SPARQLer – general purpose tool for executing SPARQL queries
SPARQLer Query Validator
YASGUI – online visual tool for querying SPARQL Endpoints
Twinkle: A SPARQL Query Tool
Apache Jena -- SPARQL

GeoSPARQL – standard for representing and querying the geospatial data using RDF

Open Data Sets:

List of open/public databases

DB2RDF (RDF and Relational Databases):