Skip to main content

Open Sourcing the Census Project

By July 9, 2015April 26th, 2018Blogs

The Census  Project, developed by David Wheeler and Samir Khakimov of the Institute for Defense Analyses (IDA), goes live today! CII co-funded the Census Project to automate analysis on a large number of open source projects to come up with a quick way to prioritize which projects to look at more closely. The Census Project calculates a “risk score” based on a number of metrics about the project, some of which are relatively static (language, website, network access) and some of which change over time (contributor count and popularity).

The results are fascinating.The Census Project is very, very good at identifying projects which are still widely popular, but which are hardly maintained. This is the sweet spot for the Core Infrastructure Initiative to look into to try to identify lurking issues and help find a way to fix them before they become problems for our core infrastructure.

The development team did an amazingly comprehensive overview of prior art before settling on the metrics in the program (check it out yourself in Section 2 of the whitepaper), but it is fun to speculate and even experiment with alternative metrics. For example, Florian Weimer suggested including the Fedora ABRT crash statistics, which I think is an inspired idea because, in aggregate, the crash reports are less game-able than CVE counts, include a nod to popularity, and show whether or not potentially critical issues are actually being fixed by projects.

We hope that this is the beginning of the discussion about which (automatable) metrics are important to assessing a project’s risk. I would like to invite you to provide feedback on the project, propose new projects to assess, help clean up the input data, and experiment with different metrics.

A big thank you goes out to Black Duck’s Open Hub and the Debian project for allowing the Census Project to use data from their sites to perform the calculations.

For more information, you can visit the websitedownload the code, and read the paper (in short form if you are in a hurry).


Author ciilf

More posts by ciilf