Add pytests for preprocessing#67
Conversation
Codecov Report
@@ Coverage Diff @@
## dscim-v0.4.0 #67 +/- ##
================================================
+ Coverage 39.94% 46.40% +6.46%
================================================
Files 17 17
Lines 1865 1866 +1
================================================
+ Hits 745 866 +121
+ Misses 1120 1000 -120 see 4 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
dscim/src/dscim/preprocessing/preprocessing.py Lines 163 to 188 in cda25bc |
| ) | ||
|
|
||
|
|
||
| def clip_damages( |
There was a problem hiding this comment.
I didn't find reference to this function anywhere except dscim-cil which only references it in this script. If this script were updated to be operational, this function would be removed entirely.
There was a problem hiding this comment.
@JMGilbert can you create a new issue about removing this, then do a new PR that removes this function entirely? Then we can reference the PR in dscim-cil where your link points. Thank you!
There was a problem hiding this comment.
Noting here that we decided this doesn't need an Issue. It's noted in the CHANGELOG
|
See #71 |
kemccusker
left a comment
There was a problem hiding this comment.
This is great, @JMGilbert thank you! I know these tests are beasts. I have a couple questions and flagged where we will probably need updates due to recent other merges.
brews
left a comment
There was a problem hiding this comment.
I was asked to have a look at this because testing untested legacy code can be such a headache, especially when it deals with IO and files like this. This is awesome work. Great use of the tmp_path fixture for temporary files. I left a few inline comments with some pointers to help make these cleaner to read and understand when they fail.
| xr.testing.assert_equal(ds_out_expected, ds_out_actual) | ||
|
|
||
|
|
||
| def test_sum_AMEL(tmp_path): |
There was a problem hiding this comment.
Add a docstr here giving one line saying the one behavior this test is trying to check. At this stage, testing legacy code, I think it's okay if it's something “Tests the basic functionality of [insert function name here]“. But try to have each test check one behavior or have one reason to fail.
There are a couple other missing docstrs below. Ill try to call them all out.
tests/test_preprocessing.py
Outdated
| ("risk_aversion", 10, 15), | ||
| ], | ||
| ) | ||
| def test_reduce_damages(tmp_path, recipe, eta, batchsize): |
There was a problem hiding this comment.
Add a docstr here saying what you're testing.
| ("risk_aversion", 10, "cheese", True), | ||
| ], | ||
| ) | ||
| def test_ce_from_chunk(tmp_path, recipe, eta, reduction, zero): |
There was a problem hiding this comment.
Add docstr briefly saying what you're testing.
tests/test_preprocessing.py
Outdated
| ], | ||
| ) | ||
| def test_reduce_damages(tmp_path, recipe, eta, batchsize): | ||
| if recipe == "adding_up" and eta is not None: |
There was a problem hiding this comment.
The if/elses here hint this test is testing more than one behavior. I'd consider breaking it down into multiple test functions so that each one is more clearly checking for one behavior or has one reason to fail. So, in this case, maybe make one test function that checks that this error gets raised, and then another test function checking "the happy path" - testing the functions spits out the correct output, etc. It becomes harder and harder to reason why a test failed or what a test is actually testing if a test starts checking for too many different things.
There are a couple of other places that might have this issue. Happy to discuss this more if needed.
tests/test_preprocessing.py
Outdated
|
|
||
| def test_sum_AMEL(tmp_path): | ||
| """ | ||
| Test that sum_AMEL outputs a Zarr file returns a file with damage function coefficients summed |
There was a problem hiding this comment.
@JMGilbert is this doc string correct? I thought it would be summing spatial damages, not df coefs.
There was a problem hiding this comment.
Oops yeah this is wrong
tests/test_preprocessing.py
Outdated
| dummy_soecioeconomics_file, consolidated=True, mode="w" | ||
| ) | ||
|
|
||
| if batchsize != 15: |
There was a problem hiding this comment.
is this block actually getting tested?
There was a problem hiding this comment.
Yup, one of the pytest fixtures makes batchsize equal to 5 which triggers this conditional.
There was a problem hiding this comment.
Oh jeez I actually checked the fixture and saw only 15s. Obviously I misread it. Thanks!
tests/test_preprocessing.py
Outdated
| ) | ||
| ) | ||
|
|
||
| if reduction not in ["cc", "no_cc"]: |
There was a problem hiding this comment.
@JMGilbert this is another spot where the test is testing two things. This conditional testing of the error should be another test function.
There was a problem hiding this comment.
For sure -- I was somewhat reluctant to separate a few of these tests because the conditional here still takes some of the synthetic inputs that I generate above this line. I was considering doing a lot of generalization (because I learned about how to make tests share inputs later on), but wasn't sure how much extra time to put in to clean these tests up given that the functionality won't actually change. I could also just copy and paste the synthetic inputs and this conditional into a new test but that feels wrong.
There was a problem hiding this comment.
(happy to do either, just wanted to make sure that you want the time put in on this)
There was a problem hiding this comment.
I see what you mean. I think going the easy route and copying the synthetic inputs to another test func is probably fine - what do you think @brews ?
There was a problem hiding this comment.
@kemccusker I think spare copy-and-paste is fine for an early test for legacy code — like, don't make it too much of a habit. Doing a nice pytest fixture to help setup input data or something might be a better way of doing it, but it's not worth the effort if we do it sloppy or if we're likely to turn around and refactor the code we are testing anyways. I'd let y'all be the judge of that and if its worth cleaning up the test so you can read its intentions later. I would argue the important thing for now, and this PR, is to get some kind of test in place for this code/behavior.
There was a problem hiding this comment.
ok @JMGilbert let's go w/ a new test the "easy" way for now since it should be quick, keeping in mind Brewster's advice for future refactors.
tests/test_preprocessing.py
Outdated
| ce_batch_coords=ce_batch_coords, | ||
| ) | ||
|
|
||
| if not zero or reduction == "no_cc": |
There was a problem hiding this comment.
could also consider splitting these two cases into two tests.
|
@kemccusker I made the tests case by case in all of the places where it makes sense. The only remaining test that has a meaningful if/else is testing the same line of code in both conditions, so I think it makes sense to keep it like that. Should be good to go now. |
kemccusker
left a comment
There was a problem hiding this comment.
Awesome, looks good to merge, @JMGilbert ! Thanks for all the effort!
No description provided.