Querying Wikidata: SELECT vs CONSTRUCT
In this blog post we’re going to build upon the newbie’s guide to querying Wikidata, and learn all about the CONSTRUCT clause.

In the newbie’s guide, we wrote the following query to find a tennis player with the name "Nick Kyrgios" and return their date of birth:
SELECT *
WHERE {
?person wdt:P106 wd:Q10833314 ;
rdfs:label 'Nick Kyrgios'@en ;
wdt:P569 ?dateOfBirth
}
where:
-
wdt:P106
is occupation -
wd:Q10833314
is tennis player -
wdt:P569
is date of birth
If we run that query, we’ll see the following output:
person | dateOfBirth |
---|---|
1995-04-27T00:00:00Z |
But what if we want to return the results as a list of triples instead?
CONSTRUCT WHERE
We can use the CONSTRUCT WHERE
clause instead of SELECT
.
A short form for the CONSTRUCT query form is provided for the case where the template and the pattern are the same and the pattern is just a basic graph pattern (no FILTERs and no complex graph patterns are allowed in the short form). The keyword WHERE is required in the short form.
I found a good article explaining the CONSTRUCT clause as part of FutureLearn’s Introduction to Linked Data and the Semantic Web course.
Our updated query looks like this:
CONSTRUCT
WHERE { ?person wdt:P106 wd:Q10833314 ;
rdfs:label 'Nick Kyrgios'@en ;
wdt:P569 ?dateOfBirth
}
And if we run that we’ll get the following output:
subject | predicate | object |
---|---|---|
Nick Kyrgios |
||
1995-04-27T00:00:00Z |
where:
-
Q3720084 is Nick Kyrgios
-
P106 is occupation
-
Q10833314 is tennis player
-
P569 is date of birth
So if we translate the three triples returned, what we have is:
Nick Kyrgios |
occupation |
tennis player |
Nick Kyrgios |
label |
Nick Kyrgios |
Nick Kyrgios |
date of birth |
1995-04-27T00:00:00Z |
So far, so good.
Let’s extend our SELECT
query to also return the person’s nationality:
SELECT *
WHERE {
?person wdt:P106 wd:Q10833314 ;
rdfs:label 'Nick Kyrgios'@en ;
wdt:P569 ?dateOfBirth ;
wdt:P27 [ rdfs:label ?countryName ] .
filter(lang(?countryName) = "en")
}
person | dateOfBirth | countryName |
---|---|---|
1995-04-27T00:00:00Z |
Australia |
Now we want to do the same thing with our CONSTRUCT
query:
CONSTRUCT
WHERE {
?person wdt:P106 wd:Q10833314 ;
rdfs:label 'Nick Kyrgios'@en ;
wdt:P569 ?dateOfBirth ;
wdt:P27 [ rdfs:label ?countryName ] .
filter(lang(?countryName) = "en")
}
If we run that query, we’ll get the following error:
SPARQL-QUERY: queryStr=CONSTRUCT
WHERE {
?person wdt:P106 wd:Q10833314 ;
rdfs:label 'Nick Kyrgios'@en ;
wdt:P569 ?dateOfBirth ;
wdt:P27 [ rdfs:label ?countryName ] .
filter(lang(?countryName) = "en")
}
java.util.concurrent.ExecutionException: org.openrdf.query.MalformedQueryException: CONSTRUCT WHERE only permits statement patterns in the WHERE clause.
As the error message indicates, we can only use statement patterns in the WHERE clause.
The filter
part of the WHERE clause is problematic, so let’s remove that:
CONSTRUCT
WHERE {
?person wdt:P106 wd:Q10833314 ;
rdfs:label 'Nick Kyrgios'@en ;
wdt:P569 ?dateOfBirth ;
wdt:P27 [ rdfs:label ?countryName ]
}
If we run that query, we’ll get the following output:
subject | predicate | object |
---|---|---|
Nick Kyrgios |
||
1995-04-27T00:00:00Z |
||
b0 |
Australia |
|
b0 |
||
b1 |
Awıstralya |
|
b1 |
||
… |
||
b5 |
ཨས་ཊེཡེ་ལི་ཡ |
|
b5 |
Hmm, the output isn’t exactly what we wanted. We have two issues to try and figure out:
-
what are those values that prefixed with
b
all about? -
we’ve got every single version of "Australia" instead of just the English version
We can fix the first problem by pulling out the country and country name separately instead of doing it all in one statement. This means that:
?player wdt:P27 [ rdfs:label ?countryName ]
becomes:
?player wdt:P27 ?country .
?country rdfs:label ?countryName
If we do that, we’ll have the following query:
CONSTRUCT
WHERE {
?person wdt:P106 wd:Q10833314 ;
rdfs:label 'Nick Kyrgios'@en ;
wdt:P569 ?dateOfBirth ;
wdt:P27 ?country .
?country rdfs:label ?countryName
}
And now let’s run that query:
subject | predicate | object |
---|---|---|
Nick Kyrgios |
||
1995-04-27T00:00:00Z |
||
Australia |
||
Australië |
||
… |
||
That’s better, but we still have all versions of Australia instead of just the English version.
Plain old CONSTRUCT
As far as I understand, to fix that we’ll need to use the normal CONSTRUCT syntax, which requires us to specify all the triples that we’d like to return.
Let’s update our query to do that:
CONSTRUCT {
?person wdt:P569 ?dateOfBirth;
rdfs:label ?playerName;
wdt:P27 ?country .
?country rdfs:label ?countryName
}
WHERE {
?person wdt:P106 wd:Q10833314 ;
rdfs:label 'Nick Kyrgios'@en ;
wdt:P569 ?dateOfBirth ;
wdt:P27 ?country .
?country rdfs:label ?countryName .
filter(lang(?countryName) = "en")
}
And if we run that query, we’ll see the following output:
subject | predicate | object |
---|---|---|
1995-04-27T00:00:00Z |
||
Australia |
That’s better, but we’re missing the statement that returns the player’s name.
We do have that statement in the CONSTRUCT clause, but we also need to have it in the WHERE clause. If we do that we’ll also need to add a language filter so that we only return the English version of the name. Our query now looks like this:
CONSTRUCT {
?person wdt:P569 ?dateOfBirth;
rdfs:label ?playerName;
wdt:P27 ?country .
?country rdfs:label ?countryName
}
WHERE {
?person wdt:P106 wd:Q10833314 ;
rdfs:label 'Nick Kyrgios'@en ;
rdfs:label ?playerName;
wdt:P569 ?dateOfBirth ;
wdt:P27 ?country .
?country rdfs:label ?countryName .
filter(lang(?countryName) = "en")
filter(lang(?playerName) = "en")
}
Now let’s run that query:
subject | predicate | object |
---|---|---|
1995-04-27T00:00:00Z |
||
Nick Kyrgios |
||
Australia |
Much better.
Returning a custom RDF graph
One other neat thing about the CONSTRUCT
clause is that we can change the RDF graph that our query returns.
The following query uses vocabulary from schema.org in place of Wikidata predicates:
PREFIX sch: <http://schema.org/>
CONSTRUCT {
?person sch:birthDate ?dateOfBirth;
sch:name ?playerName;
sch:nationality ?country .
?country sch:name ?countryName
}
WHERE {
?person wdt:P106 wd:Q10833314 ;
rdfs:label 'Nick Kyrgios'@en ;
rdfs:label ?playerName;
wdt:P569 ?dateOfBirth ;
wdt:P27 ?country .
?country rdfs:label ?countryName .
filter(lang(?countryName) = "en")
filter(lang(?playerName) = "en")
}
If we run this query, we get the following, much friendlier looking, output:
subject | predicate | object |
---|---|---|
1995-04-27T00:00:00Z |
||
Nick Kyrgios |
||
Australia |
And that’s all for now. If there’s a better way to do anything that I described, do let me know in the comments, I’m still a SPARQL newbie.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.