For each paper with an author from my institution, which of that paper's authors are from my institution?

Asked on the OpenAlex Community group:

Is there a way to find out which authors on a paper are from my institution? I downloaded a list of DOI’s from the website, and thought naively that I could look up the index of my institution (by ‘author_institution_ids’ or by ‘author_institution_names’) and then match that index to the list of authors. But soon found out that those indices don’t match because any author can have list multiple affiliations. Any ideas?

https://groups.google.com/g/openalex-community/c/T0OjBFXSIUg

I recently learned about https://semopenalex.org, which maps openalex data to RDF and thereby facililates graph-oriented queries using SPARQL.

Wanting to take the questioner’s institution as my constraining example, I determined via a search of their name that “Vrije Universiteit Amsterdam” was their institution.

Via the SemOpenAlex ontology explorer, I saw that <https://semopenalex.org/ontology/Institution> is the URI designating the class of institutions.

Via their SPARQL interface, I asked for any institution, so that I could see how names were expressed.

PREFIX soa: <https://semopenalex.org/ontology/>

SELECT ?inst WHERE {
  ?inst a soa:Institution .
} LIMIT 1

Got one, <https://semopenalex.org/institution/I28290843>. What triples (subject, predicate, object) is this the subject of?

SELECT ?p ?o WHERE {
  <https://semopenalex.org/institution/I28290843> ?p ?o .
}

Scrolling through a results table of 30 rows, I see “University of Surrey” as an object for the predicate <http://xmlns.com/foaf/0.1/name>, i.e. the name term from the fried-of-a-friend (FOAF) vocabulary. Okay, so that’s the predicate SemOpenAlex uses to connect an institution to a name. Now, let’s find “Vrije Universiteit Amsterdam”. Because there may not be an exact match, I’ll ask for institutions with names containing “Amsterdam”:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX soa: <https://semopenalex.org/ontology/>

SELECT ?inst ?instName WHERE {
  ?inst a soa:Institution .
  ?inst foaf:name ?instName .
  FILTER(contains(?instName, "Amsterdam"))
}

Okay, I see that inst <https://semopenalex.org/institution/I865915315> has instName “Vrije Universiteit Amsterdam”, and none of the other results appear to be a duplicate of this. I’ve found the institution’s URI. I’ll confirm that I get a single result from the following:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX soa: <https://semopenalex.org/ontology/>

SELECT ?inst ?instName WHERE {
  ?inst a soa:Institution .
  ?inst foaf:name ?instName .
  FILTER(?instName = "Vrije Universiteit Amsterdam")
}

I do. Great. Going back to the ontology explorer, I can see how the model connects institutions, authors, and works:

screenshot of model diagram
screenshot of model diagram

I see that SemOpenAlex records a Work as having any number of Authors as creators (via <http://purl.org/dc/terms/creator>). I also note that an Author is recorded as being a member of (<http://www.w3.org/ns/org#memberOf>) any number of Institutions.

So here’s what I end up with:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX org: <http://www.w3.org/ns/org#>
PREFIX soa: <https://semopenalex.org/ontology/>

SELECT ?work (GROUP_CONCAT(?author) as ?authors) WHERE {
  ?inst a soa:Institution .
  ?inst foaf:name ?instName .
  FILTER(?instName = "Vrije Universiteit Amsterdam")
  ?author org:memberOf ?inst .
  ?work dcterms:creator ?author .
}
GROUP BY ?work

This query retrieves works authored by at least one author that is also a member of the institution, and lists all member-of-the-institution authors for each work. As of 2024-04-18, this is 182,498 works, and the query executes in under 5 s.