Tools for sharing + maintaining code and data: Dataverse+

As avid communicators, we share stories, songs, ideas, and observations – changing with time and audience, but reasonably described in terms of linear versions. Where there is a wide range of derivative works and remixes and crossovers, archiving / versioning / creation history become difficult (still an unsolved challenge in music, performance, and literature; hidden in part by oversimplified assumptions in modern norm + law around authorship).

For larger collections and compilations, from concordances and databases to codebases and long-term studies, versioning is more interesting.

Some repositories focus on datasets and their analyses, borrowing from both version-control and file-repository tools.

Open source hosted repositories:

  • Dataverse – offers a way to inspect and query raw uncompressed datasets in a folder, without downloading it elsewhere. Offers both a globally hosted solution (the Harvard dataverse) and repository software you can run locally in a datacenter of your choice (55 dataverses worldwide).
  • Zenodo – offers a way to host and semi-permanently archive very large files and datasets, up to 10TB. Maintained by OpenAIRE and CERN.

Other hosted repositories:

  • Github itself
  • dat

Other places that gather and organize data (not self-serve):

Partial alternatives, with few promises of maintenance or shared tools for sharing, include self-hosting (Google Dataset Search) and institutional data repositories (often preserving one-time uploads of data associated with research).