Python: Add support for threat models#17203
Conversation
1e35fd4 to
8747fa4
Compare
b145618 to
2117d1f
Compare
Naming in other languages: - `SourceNode` (for QL only modeling) - `ThreatModelFlowSource` (for active sources from QL or data-extensions) However, since we use `LocalSourceNode` in Python, and `SourceNode` in JS (for local source nodes), it seems a bit confusing to follow the same naming convention as other languages, and instead I came up with new names.
Without, it's impossible to write test showing what threat-models are active by default... unless I provide a hardcoded list in the test itself, which is not any fun.
I didn't want to put the configuration file in `semmle/python/frameworks/**/*.model.yml`, so created `ext/` as in other languages
e1b2ae4 to
93b5060
Compare
asgerf
left a comment
There was a problem hiding this comment.
Looks great! I mainly a comment about the class names otherwise the structure LGTM
| * Extend this class to refine existing API models. If you want to model new APIs, | ||
| * extend `ThreatModelSource::Range` instead. | ||
| */ | ||
| class ThreatModelSource extends DataFlow::Node instanceof ThreatModelSource::Range { |
There was a problem hiding this comment.
Might I suggest the name FlowSource for this class? It seems consistent with C++ and Swift at least, and it works nicely with RemoteFlowSource being a special case of it.
Then we could rename ActiveThreatModelSource to ThreatModelFlowSource to be consistent with other languages. I do agree that the "active" prefix makes sense, but given that this will be the new go-to thing to use in isSource() predicates it seems that we really do want consistency for that class name.
There was a problem hiding this comment.
I've persuaded @michaelnebel to be in favor of ActiveThreatModelSource, so I've just filed a PR to make the existing languages use that as well 👍 #17424
FlowSource
Regarding renaming to FlowSource, I've tried to do that here: RasmusWL@dec5daa
I'm slightly hesitant to accept it, I can't quite put my finger on it, but I think it's because it's so generic that name could be used to capture the set of sources for any data-flow/taint-tracking configuration, no matter the logic, whereas ThreatModelSource seem to convey a more specific meaning to me.
I realize that your suggestion probably fits pretty well with current naming in C#/Java/Go with SourceNode and C++/Swift with FlowSource 🤔 Maybe I'll see if anyone comes with a convincing argument during next round of review, otherwise it looks like I should just disagree-and-commit.
There was a problem hiding this comment.
Whatever we do we ought to treat the naming of the two classes as a single decision; not something where we try to make a decision for each class in isolation.
The proposed rename to FlowSource made sense when the other class would be called ThreatModelFlowSource; but it doesn't work so so nicely with ActiveThreatModelSource. I'd prefer the combination ThreatModelSource/ActiveThreatModelSource over FlowSource/ActiveThreatModelSource.
There was a problem hiding this comment.
Alright, I'm feeling strongly in favor of ActiveThreatModelSource, so it seems like I won't add the FlowSource commit 👍
python/ql/lib/semmle/python/security/dataflow/CodeInjectionCustomizations.qll
Outdated
Show resolved
Hide resolved
Since using `.DictionaryElementAny` doesn't actually do a store on the source, (so we can later follow any dict read-steps). I added the ensure_tainted steps to highlight that the result of the WHOLE expression ends up "tainted", and that we don't just mark `os.environ` as the source without further flow.
As part of adding support for threat-models to Python/JS (see github#17203), we ran into some trouble with name clashes. Naming in existing languages supporting threat-models: - `SourceNode` (for QL only modeling) - `ThreatModelFlowSource` (for active sources from QL or data-extensions) However, since we use `LocalSourceNode` in Python, and `SourceNode` in JS (for local source nodes), it seems a bit confusing to follow the same naming convention as other languages, and we had to come up with new names. Initially I used `ThreatModelSource` for the "QL only modeling", but that meant that we needed a new name to represent the active sources coming from either QL or data-extensions... for this I came up with `ActiveThreatModelSource`, and I really liked it. To me, it's much clearer that this class only contains the currently active threat model sources. So to align languages, I got approval from @michaelnebel to rename the existing classes.
93b5060 to
e35c2b2
Compare
tausbn
left a comment
There was a problem hiding this comment.
Just one minor comment, otherwise I think this looks really good! (I'm approving it now in case the minor comment is irrelevant.)
Co-authored-by: Taus <tausbn@github.com>
No description provided.