Made as simple as possible, but not simpler.

Data Dictionaries for Humans and Machines

Shared datasets often have column/field names that are ambiguous in their meaning, or contain identical/related concepts with different names, hindering reuse.

When ETL Is a Symptom

When you have several different applications (e.g. to perform simulations and analyses) that each have their own data model, it’s typical for each to also maintain its own siloed data store.

"Lets Not Reinvent the Wheel"

I was reading about hidden costs of “packaged” software solutions – that is, using existing software to solve problems – and came across this sentence:1

Data Reduction for Science

Earlier this week, I wrote that In sharing scientific research data, the goal is often to provide data reductions to the extent possible without loss – the output is, in a strong sense, equivalent to the input.

