Click here to display content from Twitter.
Learn more in Twitter’s privacy policy.

26 September, 2021

Juro-dev@Jurolytics

Nothing short of a personal dev-olution

A tiny SPARQL-ing in your eyes…

3 min read

 

or: A still incomplete map of global think tanks

So I set out to explore the SPARQL Protocol And RDF Query Language (SPARQL) to unlock the vast webs of linked open data (LOD) made accessible by various public and especially cultural institutions on the internet. My starting point was this tutorial by Matthew Lincoln on The Programming Historian.

 

Thanks to this list maintained by the World Wide Web Consortium, it is quite easy to find interesting endpoints for everyone to use. However, some of the listed sites suffer from downtime issues.

Being able to browse Wikipedia’s relational data eventually piqued my interest. For this, I quickly ended up at FactForge‘s slick platform, developed by Sirma AI. They offer some rather nifty search options as part of their GraphDB product. Wikipedia (infobox) data is made accessible through the DBPedia Association and, as an alternative to FactForge, you can go sparql-ing around directly at DBPedia’s endpoint.

In order to keep it simple for now and not to suffer too many headaches learning the ropes of SPARQL (for which being acquainted with SQL is of great advantage), I wanted to produce a map of global think tanks. You can find an example query that I wrote here:

FactForge query

Fortunately, latitude/longitude values (via W3C) as well as homepage URLs (via DBPedia) can be queried. The downloadable .csv files can easily be read into RStudio for further cleaning, joining, and visualisation. I then made use of the wonderful Leaflet package (which relies on the equally wonderful OpenStreetMap service) to produce the interactive map above.

 

So why is this map still incomplete and what did I learn?

 

1) As gigantic and overwhelming as Wikipedia is, its individual entries, even when seemingly belonging to the same category, can differ significantly from each other in structure. As an example, I relied on this list (which should by no means be seen as exhaustive) to gather the names of major non-US and non-UK think tanks. In fact, most of these think tanks do not share the same Wikipedia category. Therefore, it is not just a matter of querying by dct:subject and dbc:Think_tank (like here). Furthermore, just by comparing the pages of TAI and ASPI, you can see how different the respective infoboxes are. ASPI has an official headquarters listed while the location of TAI can only be found within the description. This quickly put an end to my original intention to collect all cities where the think tanks are located and match them with lat/long values. Such a method would have been additionally helpful because the option of querying the Basic Geo (WGS84 lat/long) Vocabulary did not return a satisfactory number of geographic coordinates.

2) SPARQL ≠ web scraping? With no unifying category to bind them all, the biggest issue turned out to be the “simple” act of extracting the links from wiki/List_of_think_tanks. Using the predicate dbo:wikiPageWikiLink would retrieve around 100 of the listed organisations which is only a fraction of the visible total. Referring to <http://dbpedia.org/page_links> is a method mentioned in this StackOverflow discussion – any suggestions are welcome! Otherwise, I feel forced to extract the links in a very non-SPARQL, HTML crawling way of sorts. And thus we know what should be the subject of the follow-up post.

3) Did I mention relational data? Yes, I did. And I apologize if you feel totally misled now because a map of think tanks displays nothing much of anything relational, does it? Well, I did start out putting together some queries like the one below – this gives a glimpse of the true potential of SPARQL (and Wikipedia):

SELECT * WHERE {
?sub dct:subject dbc:American_political_scientists .
?sub foaf:name ?SubName.
?sub dbo:influencedBy ?obj.
?obj foaf:name ?ObjName.
} ORDER BY ?SubName
limit 100

But I will get back to you with some deeper exploration in the future.

In the meantime, I would be happy to read about your hacks and success stories working the realms of LOD!

 

Leave a Reply