CSV on the Web: Sidecars for Spreadsheets

If you share data on the web as delimiter-separated values – that is, as spreadsheets – there is a world of power-ups available to you.

The term “sidecar” is used for a functional addition. A motorcycle sidecar can carry things and people. A Kubernetes sidecar container has access to the namespace and storage volumes of it’s pod’s main container, and so supports auxiliary work. Unstructured documentation, e.g. a typical README file, is not a sidecar.

The W3C’s “CSV on the Web” (CSVW) working group published seven documents, including a note on 25 identified use cases and a primer on effective use of its recommendations in practice. In the simplest case, when you’re serving a csv file like mydata.csv, you also serve a JSON sidecar by adding -metadata.json to the name (e.g. mydata.csv-metadata.json), and you use the CSVW vocabulary to provide extra information about your data.

There are limits to the logic you can express using the CSVW vocabulary, and this is reasonable (cf. the “rule of least power”)! The Shapes Constraint Language (SHACL) vocabulary extends expression of logic to JavaScript functions (Python functions seem doable…) – this could help to transform a spreadsheet’s cell values to conform to a desired schema.

Subscribe to get short notes like this on Machine-Centric Science delivered to your email.