[AnomalyDetection] Add base classes and specifiable protocol#33845
[AnomalyDetection] Add base classes and specifiable protocol#33845damccorm merged 11 commits intoapache:masterfrom
Conversation
|
r: @damccorm |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #33845 +/- ##
============================================
+ Coverage 59.08% 59.12% +0.03%
Complexity 3237 3237
============================================
Files 1156 1158 +2
Lines 176907 177113 +206
Branches 3391 3391
============================================
+ Hits 104532 104715 +183
- Misses 69008 69031 +23
Partials 3367 3367
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
bc6fa8f to
acc6448
Compare
acc6448 to
4023c80
Compare
damccorm
left a comment
There was a problem hiding this comment.
Mostly LGTM, but had some questions. In general, docs would be really helpful here. I really like the concept
| def run_init(self): | ||
| original_init(self, **self._init_params) | ||
|
|
||
| def new_getattr(self, name): |
There was a problem hiding this comment.
Maybe I'm not understanding this well, but could we just do something a bit simpler like:
def new_getattr(self, name):
if not self._initialized:
original_init()
self._initialized = True
return orig_getattr(self, name)
if we keep track of the original getattr function in orig_getattr
There was a problem hiding this comment.
The general idea is like this, but there are some edge cases we need to avoid. Otherwise, we will end up with an infinite loop.
|
|
||
| from typing_extensions import Self | ||
|
|
||
| ACCEPTED_SPECIFIABLE_SUBSPACES = [ |
There was a problem hiding this comment.
Why do we start with these registered? Would it be cleaner to just register them on import?
There was a problem hiding this comment.
These are the accepted subspaces for the known specifiable, but not the actual specifiable classes. The purpose is to avoid using a global space for every specifiable classes.
I think they should be predefined before the true registration takes place. WDYT?
There was a problem hiding this comment.
The purpose is to avoid using a global space for every specifiable classes.
I think maybe this is the piece I'm missing - why is this helpful? Aka, why is it helpful to have a nested KNOWN_SPECIFIABLE[subspace][spec_type] dict instead of just KNOWN_SPECIFIABLE[spec_type]. I can't really think of scenarios this enables/makes easier
There was a problem hiding this comment.
There is a concern about keeping everything in a global namespace from the design doc.
https://docs.google.com/document/d/1tE8lz9U_vjlNn2H7t-GRrs3vfhQ5UuCgWiHXCRHRPns/edit?disco=AAABaZ332MQ
I think the primary goal here is to avoid naming conflict, but I am open to any suggestion.
There was a problem hiding this comment.
Ok, I can live with this I think, lets just make sure:
a) the user has an easy path to create this object without specifying a subspace (e.g. the mainline case where conflicts are not really an issue)
b) it is really clear what a user should do if there's a conflict
There was a problem hiding this comment.
(a) Yes, I think the subspace is transparent to users. Users don't need to specify subspace when they use the decorator to augment their class.
(b) When there is a conflict, users can choose to specify a different spec_type when calling the decorator. I will make sure I have added that clearly in the docstring of the decorator.
On a second thought, I think it may be helpful to turn KNOWN_SPECIFIABLE into a module private variable, so users will know they are not supposed to manage it directly.
371d465 to
0b3a7ca
Compare
0b3a7ca to
e4a32c2
Compare
|
@damccorm, I've added docstrings and refactored some code for readability. PTAL. Thanks! |
|
|
||
| from typing_extensions import Self | ||
|
|
||
| ACCEPTED_SPECIFIABLE_SUBSPACES = [ |
There was a problem hiding this comment.
The purpose is to avoid using a global space for every specifiable classes.
I think maybe this is the piece I'm missing - why is this helpful? Aka, why is it helpful to have a nested KNOWN_SPECIFIABLE[subspace][spec_type] dict instead of just KNOWN_SPECIFIABLE[spec_type]. I can't really think of scenarios this enables/makes easier
b9221f1 to
5f0debf
Compare
…c_type to resolve naming conclict.
38ae483 to
38b0a89
Compare
damccorm
left a comment
There was a problem hiding this comment.
Thanks for the updates. I just have one more thought which includes an API change - other than that we can take this and add in things as needed in the future
…33845) * Add base classes and specifiable protocol for anomaly detection. * Add subspaces to global specifiable map * Add __init__.py * Fix lints * Fix get_subspace when calling from from_spec * Refactor code, add tests and add docstrings. * Minor changes to docstrings and comments * Remove the fallback subspace '*' from accepted list. Use it in tests only. * Bring fallback subspace back to accepted list. Clarify the use of spec_type to resolve naming conclict. * Make _KNOWN_SPECIFIABLE a defaultdict. Remove error_if_exiists. * Minor adjustment on docstrings.
This is the first part of the code push for anomaly detection transform.