Skip to content

Parse CHANGELOGs to discover new Vulnerabilities #233

@sbs2001

Description

@sbs2001

FYI This came up in @pombredanne 's talk at Open Source Summit 2020.

The idea is, FOSS projects which don't come under any CNA might have discovered several bugs which may come under security category and due to
-1. The complexity of getting a CVE
or
-2 Inability to classify a bug as a security issue.

Such security issues may go unnoticed. If we are able to find such issues we will be able to make FOSS safer and a better place obviously, and the users will now have an incentive to upgrade the software which makes coping with changes bearable.

One way to acheive this goal is, parsing CHANGELOGS of FOSS projects, and finding changes which are related to security fixes. For this the implementation of ML classifier would look like the following: (This is repaste from gitter)

use our existing data, find the version of a package where the vulnerability was first fixed, map the version to it's changelog . There's https://github.com/pyupio/changelogs to fetch changelogs(it maps version->change too). Extract such changelogs.

ML model would be trained by something along the lines of : Given the presence and absence of such and such words , the changelog is/not related to Security. And we would also add non security related changelogs during the training too, so the model is not biased.

The classifier won't be accurate, but would definitely reduce the search space. The CHANGES tagged with security will be fed into a manual curation queue and issued a Vulnerability identifier (Something like CVE) bringing it to 'addressable existence'.

The beginning of wisdom is to call things by their proper name.
-Confucius

This needs #232 to be addressed.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions