Tools for sharing + maintaining code and data: Dataverse+

metasj · March 4, 2020, 8:16pm

As avid communicators, we share stories, songs, ideas, and observations – changing with time and audience, but reasonably described in terms of linear versions. Where there is a wide range of derivative works and remixes and crossovers, archiving / versioning / creation history become difficult (still an unsolved challenge in music, performance, and literature; hidden in part by oversimplified assumptions in modern norm + law around authorship).

For larger collections and compilations, from concordances and databases to codebases and long-term studies, versioning is more interesting.

Some repositories focus on datasets and their analyses, borrowing from both version-control and file-repository tools.

Open source hosted repositories:

Dataverse – offers a way to inspect and query raw uncompressed datasets in a folder, without downloading it elsewhere. Offers both a globally hosted solution (the Harvard dataverse) and repository software you can run locally in a datacenter of your choice (55 dataverses worldwide).
Zenodo – offers a way to host and semi-permanently archive very large files and datasets, up to 10TB. Maintained by OpenAIRE and CERN.

Other hosted repositories:

Github itself
dat

Other places that gather and organize data (not self-serve):

datahub.io
data.world - commercial

Partial alternatives, with few promises of maintenance or shared tools for sharing, include self-hosting (Google Dataset Search) and institutional data repositories (often preserving one-time uploads of data associated with research).

Topic		Replies	Views
Infrastructure for reproducible data analysis Underlay jupyter	0	447	July 11, 2019
Qri: A global dataset version control system built on the distributed web Underlay	0	428	March 23, 2019
Data and Model sharing Underlay provenance	0	334	August 20, 2020
Ideas for community engagement Underlay	3	576	November 22, 2020
[Draft] R1 Collections and Pipelines Underlay	3	453	December 18, 2020

Tools for sharing + maintaining code and data: Dataverse+

Related topics