Made as simple as possible, but not simpler.
Shared datasets often have column/field names that are ambiguous in their meaning, or contain identical/related concepts with different names, hindering reuse.
When you have several different applications (e.g. to perform simulations and analyses) that each have their own data model, it’s typical for each to also maintain its own siloed data store.
I was reading about hidden costs of “packaged” software solutions – that is, using existing software to solve problems – and came across this sentence:1
Earlier this week, I wrote that In sharing scientific research data, the goal is often to provide data reductions to the extent possible without loss – the output is, in a strong sense, equivalent to the input.