For each paper with an author from my institution, which of that paper's authors are from my institution?
Asked on the OpenAlex Community group:
Is there a way to find out which authors on a paper are from my institution? I downloaded a list of DOI’s from the website, and thought naively that I could look up the index of my institution (by ‘author_institution_ids’ or by ‘author_institution_names’) and then match that index to the list of authors. But soon found out that those indices don’t match because any author can have list multiple affiliations. Any ideas?
— https://groups.google.com/g/openalex-community/c/T0OjBFXSIUg
I recently learned about https://semopenalex.org, which maps openalex data to RDF and thereby facililates graph-oriented queries using SPARQL.
Wanting to take the questioner’s institution as my constraining example, I determined via a search of their name that “Vrije Universiteit Amsterdam” was their institution.
Via the SemOpenAlex ontology explorer, I saw that <https://semopenalex.org/ontology/Institution>
is the URI designating the class of institutions.
Via their SPARQL interface, I asked for any institution, so that I could see how names were expressed.
PREFIX soa: <https://semopenalex.org/ontology/>
SELECT ?inst WHERE {
?inst a soa:Institution .
} LIMIT 1
Got one, <https://semopenalex.org/institution/I28290843>
. What triples (subject, predicate, object) is this the subject of?
SELECT ?p ?o WHERE {
<https://semopenalex.org/institution/I28290843> ?p ?o .
}
Scrolling through a results table of 30 rows, I see “University of Surrey” as an object for the predicate <http://xmlns.com/foaf/0.1/name>
, i.e. the name
term from the fried-of-a-friend (FOAF) vocabulary.
Okay, so that’s the predicate SemOpenAlex uses to connect an institution to a name.
Now, let’s find “Vrije Universiteit Amsterdam”.
Because there may not be an exact match, I’ll ask for institutions with names containing “Amsterdam”:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX soa: <https://semopenalex.org/ontology/>
SELECT ?inst ?instName WHERE {
?inst a soa:Institution .
?inst foaf:name ?instName .
FILTER(contains(?instName, "Amsterdam"))
}
Okay, I see that inst
<https://semopenalex.org/institution/I865915315>
has instName
“Vrije Universiteit Amsterdam”, and none of the other results appear to be a duplicate of this.
I’ve found the institution’s URI.
I’ll confirm that I get a single result from the following:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX soa: <https://semopenalex.org/ontology/>
SELECT ?inst ?instName WHERE {
?inst a soa:Institution .
?inst foaf:name ?instName .
FILTER(?instName = "Vrije Universiteit Amsterdam")
}
I do. Great. Going back to the ontology explorer, I can see how the model connects institutions, authors, and works:
I see that SemOpenAlex records a Work
as having any number of Author
s as creators (via <http://purl.org/dc/terms/creator>
).
I also note that an Author
is recorded as being a member of (<http://www.w3.org/ns/org#memberOf>
) any number of Institution
s.
So here’s what I end up with:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX org: <http://www.w3.org/ns/org#>
PREFIX soa: <https://semopenalex.org/ontology/>
SELECT ?work (GROUP_CONCAT(?author) as ?authors) WHERE {
?inst a soa:Institution .
?inst foaf:name ?instName .
FILTER(?instName = "Vrije Universiteit Amsterdam")
?author org:memberOf ?inst .
?work dcterms:creator ?author .
}
GROUP BY ?work
This query retrieves works authored by at least one author that is also a member of the institution, and lists all member-of-the-institution authors for each work. As of 2024-04-18, this is 182,498 works, and the query executes in under 5 s.