Software Heritage, the universal source code archive, was announced to the public in the spring of 2016. Since then, our development has been running in the open, using all Free Software; we have a preview API available to the public, and Inria, our umbrella institution, signed an agreement on software preservation with UNESCO. So far, our archive contains more than 3.6 billion source files, spanning 825 million revisions across almost 65 million projects. We’re archiving git repositories, subversion repositories, source packages of Debian-based distributions, and we’re doing the legwork to import mercurial repositories, all in a uniform, VCS-agnostic archive. Archiving this amount of data, and building the largest (virtual) VCS graph in history, on the infrastructure provided by a public research institution, isn’t without challenges; This lightning talk will review the current infrastructure choices made by the project, and open a conversation about how we’ll make it evolve so that it can outlast us all.

Categories: community
wp_statistics_words_count: 155
year: 2017
speaker: Nicolas DANDRIMONT