Indexing Identifiers

Indexing identifiers is key to disambiguating entities.

Wikipedia has disambiguation pages. For example, there are various concepts in mathematics and computing, various computing products, and various companies that identify with the term “Precision”. I made disambiguation pages for same-chemical-formula inorganic crystal structures for the Materials Project.

Indexing identifiers is also key to unifying entities. It’s an open world after all,1 with a comcomitant non-unique naming assumption. OpenAlex indexes various ID types for a work. For example, http://api.openalex.org/works/https://doi.org/10.7717/peerj.4375 will funnel you to the payload for https://openalex.org/W2741809807, which has an ids field with openalex, doi, mag (Microsoft Academic Graph), pmid (Pubmed), and pmcid (Pubmed Central) IDs.

Finally, indexing identifiers is key to registering and resolving metadata, i.e. relationships between identifiers. Registries include Linked Open Vocabularies (LOV), the Ontology Lookup Service (OLS), the Zazuko Prefix Server, and the OBO Foundry. Resolvers include Identifiers.org and Name-To-Thing (n2t). There is even at least one metaregistry, Bioregistry.io.

Any time you encounter a web service using a “remote data access” style, i.e. exposing a query language via a single access point – SQL, SPARQL, GraphQL, MongoDB, etc. – its highly likely that all entity identifiers are indexed to support efficient retrieval and combination/joining.


References

  1. Unless you can bask in glorious isolation in a siloed domain/organization. ↩︎