Data Sharing versus Data Collaboration

Sharing is a way to facilitate concurrency. Collaboration is a way to orchestrate concurrent operations on what is shared.

In this sense, the problem of data sharing and collaboration is an in-the-large version of the in-your-programming-language-and-runtime problem of shared memory and concurrent processes/threads. Just like the subsystems of a larger software solution should consider eschewing local data-management parochialisms in favor of the language of the system, so should scientific researchers consider the impact of their parochial decisions about data structure on the potential for sharing and collaboration.

Data collaboration is operations collaboration – joining the results of operations on shared data to achieve desired outcomes using the union of capital resources from both organizations. The facilitation of data sharing is not enough – you can build a filing cabinet, but it may remain empty.