Parse CHANGELOGs to discover new Vulnerabilities

FYI This came up in @pombredanne 's talk at  Open Source Summit 2020.

The idea is, FOSS projects which don't come under any [CNA](https://cve.mitre.org/cve/cna.html) might have discovered several bugs which may come under security category and due to  
-1. The complexity of getting a CVE 
or 
-2 Inability to classify a bug as a security issue.

Such security issues may go unnoticed. If we are able to find such issues we will be able to make FOSS safer and a better place obviously, and the users will now have an incentive to upgrade the software which makes coping with changes bearable. 

One way to acheive this goal is, parsing CHANGELOGS of FOSS projects, and finding changes which are related to security fixes. For this the implementation of  ML classifier would look like the following: (This is repaste from gitter)

> use our existing data, find the version of a package where the vulnerability was first fixed, map the version to it's changelog . There's https://github.com/pyupio/changelogs to fetch changelogs(it maps version->change too). Extract such changelogs.
> 
> ML model would be trained by something along the lines of : Given the presence and absence of such and such words , the changelog is/not related to Security. And we would also add non security related changelogs during the training too, so the model is not biased.

The classifier won't be accurate, but would definitely reduce the search space. The CHANGES tagged with security will be fed into a manual curation queue and issued a Vulnerability identifier (Something like CVE) bringing it to 'addressable existence'. 

>The beginning of wisdom is to call things by their proper name.
-Confucius

This needs https://github.com/nexB/vulnerablecode/issues/232   to be addressed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parse CHANGELOGs to discover new Vulnerabilities #233

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Parse CHANGELOGs to discover new Vulnerabilities #233

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions