Data Collaboration Is Hard Like Distributed Computing Is Hard
It is hard to mediate among concrete representations, among data structures with differing schema. There are certainly valiant efforts to replicate shared data structures without conflict and to facilitate distributed schema evolution, i.e. to sync “under the hood” concerns.
Alan Kay’s original vision of “object-oriented” design is analogous to the “cell-oriented” operation of a human being, in that questions of concrete representation are postponed.
“Every cell in our bodies is a descendant of a single zygote. All the cells have exactly the same genetic endowment (about 1GByte of ROM!). However there are skin cells, neurons, muscle cells, etc. The cells organize themselves to be discrete tissues, organs, and organ systems. This is possible because the way a cell differentiates and specializes depends on its environment…context that selects particular behaviors from the possible behaviors that are available in its genetic program.”1
In distributed computing systems with a log-centric architecture, each individual service (“cell”) has the same genetic endowment – the shared log of events. Each service is a state machine that can respond to the event log differently, and thus exhibit different behavior. At any time, a service can be “re-built”, perhaps using different logic, that re-plays the same event log to do its thing.
In a scientific data collaboration, each actor specializes in acting on and producing data in a particular way. If all input and output is cast as a log of simply-structured events, as a shared and growing sequence of nucleotides (and with a growing number of distinct “types” – not just A, T, G, and C “events”!), then each actor can draw upon a shared genetic endowment for its activities, even as those activities may change over time – as actors differentiate.