The Census project is an experimental tool to help parse through the known data about popular open source projects to help identify the ones that should be tested to see if they need some help.
The Census represents CII’s current view of the open source ecosystem and which projects are at risk. The Heartbleed vulnerability in OpenSSL highlighted that while some open source software (OSS) is widely used and depended on, vulnerabilities can have serious ramifications, and yet some projects have not received the level of security analysis appropriate to their importance. Some OSS projects have many participants, perform in-depth security analyses, and produce software that is widely considered to have high quality and strong security. However, other OSS projects have small teams that have limited time to do the tasks necessary for strong security. The trick is to identify quickly which critical projects fall into the second bucket.
The Census Project focuses on automatically gathering metrics, especially those that suggest less active projects (such as a low contributor count). We also provided a human estimate of the program’s exposure to attack, and developed a scoring system to heuristically combine these metrics. These heuristics identified especially plausible candidates for further consideration. For the initial set of projects to examine, we took the set of packages installed by Debian base and added a set of packages that were identified as potentially concerning. A natural outcome of the census will be a list of projects to consider funding. The decision to fund a project in need is not automated by any means.
The Census serves to prioritize projects for review and is used as input to the CII Steering Committee. Unlike the Fed’s stress tests, which are opaque, all of the census data and analysis is open source. We are eager for community involvement. We encourage developers to fork the project and experiment with different data sources, different parameters, and different algorithms to test out the concept of an automated risk assessment census. We are also eager for input to help sanitize and complete the data that was used in this first iteration of the census. Projects receive a ranking between 0-16 depending on the analysis of several aspects of the project where 0 is the best score and 16 indicates the highest level of risk. The highest actual ranking that we saw during this run was 13. The program takes the initial list of projects to review (via projects_to_examine.csv) and then retrieves information about the projects from Black Duck's Open Hub and from Debian, and also retrieves the CVEs for the project. The project data is cached. The program then evaluates information about the project (described in detail below) and assigns a risk metric to the project. The complete set of data and score is then output to results.csv.
The risk score is calculated based on the following parameters about the project:
- Website: If the project has no website, it receives a point.
- CVEs: If the project has >=4 CVEs (since 2010), it receives 3 points. 2 points for 2 or 3 CVEs. 1 point for 1 CVE. Note that the absence of CVEs does not necessarily indicate the absence of vulnerabilities, it may instead indicate that no one has looked for vulnerabilities in the project or that no one filed the CVE requests for vulnerabilities when found.
- Contributors: If the 12 month contributor-count is 0, the project receives 5 points. 4 points for 1-3 contributors, 2 points if the number of contributors is unknown.
- Popularity: If the package is in the top 1% of installed packages tracked by the Debian popularity contest, it receives 2 points. 1 point if it is in the top 5%.
- Main Language: If the project’s main language is C or C++, add 2 points.
Network Exposure: If the package is directly exposed to the network (whether client or server), it receives 2 points. If it is used to process data provided by a network, it receives 1 point. It receives 1 point if it typically runs as root (either via suid or directly), or controls access to such, and therefore is a risk for local privilege escalation.
- Application Data Only: The package gets 3 points taken away, if the Debian database reports that it is “Application Data” or “Standalone Data” rather than an application.
Pretty much everyone who views the census algorithm immediately thinks up possible modifications to the algorithm. Some changes which are most commonly suggested include:
- Dependencies Add 2 points if the package has more than 5 other packages which are dependent on it. Add 1 point if the package is depended upon by 1-5 packages. This parameter would promote packages which are often relied upon by other packages. In doing so, it would identify core infrastructure. This parameter may have some overlap with the Popularity parameter already included.
- Patches If the deb or rpm includes more than 5 patches which have not been accepted upstream, the package receives a point. Distros carry patches for unique packaging requirements and when the upstream project is non-responsive. The patches are often less reviewed than the original project and so may add risk to the project. This parameter may have some overlap with the Contributor Count parameter already included.
- ABRT crash statistics If the crash statistics are increasing over time, then add 2 points to the projects score. If the statistics are stable but high, add 1 point.
Full List of Parameters Which were Considered:
Please read section 2 of the Census whitepaper entitled "Open Source Software Needing Security Investments" to see the full list of parameters that were considered by this and related projects before this particular set of parameters was chosen for the first implementation of the Census.
How to Interpret the Results:
A high score in the Census does not mean that we have found a vulnerability in the project. Nor does it mean that you should necessarily stop using the project. A high score does not mean that the project is "bad". A high score means that the project may not be getting the attention that it deserves and that it merits further investigation.
Call to Action:
What parameters do you think should be added? Sound off on the mailing list or fork your own version from the project's github repository to try out your ideas. Send us a pull request for the ones that are proven out via experimentation to work the best.
For more detailed information on the Census Project, please download the whitepaper "Open Source Security Census: Open Source Software Projects Needing Security Investments" by the Institute for Defense Analyses and the Linux Foundation. CII has also produced a "Census Project Summary" that is a short summary of the full whitepaper.