For each paper with an author from my institution, which of that paper's authors are from my institution?

Asked on the OpenAlex Community group:

Is there a way to find out which authors on a paper are from my institution? I downloaded a list of DOI’s from the website, and thought naively that I could look up the index of my institution (by ‘author_institution_ids’ or by ‘author_institution_names’) and then match that index to the list of authors. But soon found out that those indices don’t match because any author can have list multiple affiliations. Any ideas?

I recently learned about, which maps openalex data to RDF and thereby facililates graph-oriented queries using SPARQL.

Wanting to take the questioner’s institution as my constraining example, I determined via a search of their name that “Vrije Universiteit Amsterdam” was their institution.

Via the SemOpenAlex ontology explorer, I saw that <> is the URI designating the class of institutions.

Via their SPARQL interface, I asked for any institution, so that I could see how names were expressed.

PREFIX soa: <>

  ?inst a soa:Institution .

Got one, <>. What triples (subject, predicate, object) is this the subject of?

  <> ?p ?o .

Scrolling through a results table of 30 rows, I see “University of Surrey” as an object for the predicate <>, i.e. the name term from the fried-of-a-friend (FOAF) vocabulary. Okay, so that’s the predicate SemOpenAlex uses to connect an institution to a name. Now, let’s find “Vrije Universiteit Amsterdam”. Because there may not be an exact match, I’ll ask for institutions with names containing “Amsterdam”:

PREFIX foaf: <>
PREFIX soa: <>

SELECT ?inst ?instName WHERE {
  ?inst a soa:Institution .
  ?inst foaf:name ?instName .
  FILTER(contains(?instName, "Amsterdam"))

Okay, I see that inst <> has instName “Vrije Universiteit Amsterdam”, and none of the other results appear to be a duplicate of this. I’ve found the institution’s URI. I’ll confirm that I get a single result from the following:

PREFIX foaf: <>
PREFIX soa: <>

SELECT ?inst ?instName WHERE {
  ?inst a soa:Institution .
  ?inst foaf:name ?instName .
  FILTER(?instName = "Vrije Universiteit Amsterdam")

I do. Great. Going back to the ontology explorer, I can see how the model connects institutions, authors, and works:

screenshot of model diagram
screenshot of model diagram

I see that SemOpenAlex records a Work as having any number of Authors as creators (via <>). I also note that an Author is recorded as being a member of (<>) any number of Institutions.

So here’s what I end up with:

PREFIX foaf: <>
PREFIX dcterms: <>
PREFIX org: <>
PREFIX soa: <>

SELECT ?work (GROUP_CONCAT(?author) as ?authors) WHERE {
  ?inst a soa:Institution .
  ?inst foaf:name ?instName .
  FILTER(?instName = "Vrije Universiteit Amsterdam")
  ?author org:memberOf ?inst .
  ?work dcterms:creator ?author .
GROUP BY ?work

This query retrieves works authored by at least one author that is also a member of the institution, and lists all member-of-the-institution authors for each work. As of 2024-04-18, this is 182,498 works, and the query executes in under 5 s.