Data Stacks for FAIR

I noticed a pattern at the top of each case study listed by Stemma.ai, which provides data catalog software as a service based on the open-source Amundsen code. Each case study’s so-called “Data Stack” comprises up to four distinct categories of functionality – Data Catalog, Data Warehouse, ETL, and Business Intelligence.

The “Data Stack” for each case study:

Case Data Catalog Data Warehouse ETL Business Intelligence
Lyft Amundsen Presto Apache Airflow Mode,Apache Superset
Convoy Stemma Snowflake dbt, Apache Airflow Tableau, Metabase
iRobot Stemma Amazon Athena (blank) Mode
ING Amundsen Trino (formerly, Presto SQL) (blank) Apache Superset

These categories struck me in relation with the FAIR Principles1:

It’s encouraging to see high-level alignment between the FAIR Principles and a conceptualization of useful enterprise data systems in the corporate world.


  1. M. D. Wilkinson et al., “The FAIR Guiding Principles for scientific data management and stewardship,” Sci Data, vol. 3, no. 1, p. 160018, Mar. 2016, doi: 10/bdd4. ↩︎

  2. Although a term I think may be more apt here than Data Orchestration, which has an imperative tone, is Data Reconciliation, which has a declarative tone – see e.g. S. Ryza, “Introducing Software-Defined Assets”, Dagster Blog, Mar. 2022. https://dagster.io/blog/software-defined-assets (accessed May 31, 2022). ↩︎