Open, Reproducible, Sustainable

Irving et al.1 approach digital research infrastructure through three related but distinct concepts: openness, reproducibility, and sustainability.

Openness is about making digital resources generally accessible for reuse, sharing, and adaptation through an open license.

Reproduciblity is about describing and documenting methods and results so that anyone with access to the data and methods can re-apply the methods on the data to obtain the same results.

Sustainability is about making it easy to maintain and extend a system rather than replace it; this depends not just on the properties of an artifact, but on the skills of potential maintainers and on the investments of users.

If you share data and code, but you don’t document how to stage the data and apply the code, the work is open but not reproducible.

If you document and automate your analysis, but the data is only accessible to people in your lab, the work is reproducible but not open.

If your work is open and reproducible when initially published, but the supporting system cannot be adequately maintained and extended, then the sytem will become abandonware, and the work’s openness and reproducibility will become less relevant.

To be sustainable, such a system must either be robust to a revolving door of inexperienced maintainers and scattershot investments of one-off users, or the field must develop a culture that values tool-building.

How does your team separately address the openness, reproducibility, and sustainability of its research data product infrastructure?

This post was adapted from a note sent to my email list on Scientific Data Unification.
I'd love for you to subscribe.

  1. D. Irving, K. Hertweck, L. Johnston, J. Ostblom, C. Wickham, and G. Wilson, Research Software Engineering with Python: Building software that makes research possible. CRC Press, 2021. Available at https://merely-useful.tech/py-rse/↩︎