-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
BUG/MAINT: Change default of inplace to False in pd.eval #16732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5b38b03 to
bf6fbe5
Compare
|
Apparently, >200 (!) of our tests didn't take heed of the warning to pass in |
|
Also, I must say that the docstring for @jreback @jorisvandenbossche : Thoughts? |
bf6fbe5 to
b7cee03
Compare
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding an example to the docstring might clarify a lot what this function actually does
pandas/core/computation/eval.py
Outdated
| in a future version, will default to False. Use inplace=True | ||
| explicitly rather than relying on the default. | ||
| inplace : bool, default False | ||
| If the expression mutates, whether to modify the expression inplace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding an example might clarify it further eg: "If the expression mutates (e.g. creates a new column), ..."
pandas/core/computation/eval.py
Outdated
| in a future version, will default to False. Use inplace=True | ||
| explicitly rather than relying on the default. | ||
| inplace : bool, default False | ||
| If the expression mutates, whether to modify the expression inplace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You changed object to expression in "whether to modify object inplace", but I think object is actually better.
Looking at the test example:
In [87]: df = DataFrame(np.random.randn(5, 2), columns=list('ab'))
In [88]: df.eval('c = a + b', inplace=False)
Out[88]:
a b c
0 -0.742101 2.084071 1.341970
1 -0.823597 -1.686791 -2.510388
2 0.277927 -0.076424 0.201503
3 1.159483 0.188500 1.347983
4 0.010328 -0.890899 -0.880571
In [89]: df
Out[89]:
a b
0 -0.742101 2.084071
1 -0.823597 -1.686791
2 0.277927 -0.076424
3 1.159483 0.188500
4 0.010328 -0.890899
it is the df object that does or not get modified, not the "expression"
pandas/core/computation/eval.py
Outdated
| inplace : bool, default False | ||
| If the expression mutates, whether to modify the expression inplace | ||
| or return a copy of it with mutation. In other words, it will return | ||
| the result of the expression if `inplace=True` and `target` otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"and target otherwise." is not fully correct. It can also mutate the calling object.
WIth the same example:
In [89]: df
Out[89]:
a b
0 -0.742101 2.084071
1 -0.823597 -1.686791
2 0.277927 -0.076424
3 1.159483 0.188500
4 0.010328 -0.890899
In [90]: df.eval('c = a + b', inplace=True)
In [91]: df
Out[91]:
a b c
0 -0.742101 2.084071 1.341970
1 -0.823597 -1.686791 -2.510388
2 0.277927 -0.076424 0.201503
3 1.159483 0.188500 1.347983
4 0.010328 -0.890899 -0.880571
|
Whoops, sorry, all my comments were in light of me thinking it was about Anyhow, the frame method also has to be updated, and there the comments will be sensible :-) (edit: but actually the docstring there is already much clearer!) |
|
@jorisvandenbossche : Actually ignore my previous comment. That doesn't make much sense to me at all still. The documentation says "whether to return a new DataFrame or mutate the existing." Which condition means return a new If you say it's IMO, it should be that >>> import pandas as pd
>>> c = pd.eval('1 + 2', inplace=True)
>>> c
3
>>> c = pd.eval('1 + 2', inplace=False)
>>> c
>>> |
b7cee03 to
d52e95d
Compare
|
cc @chris-b1 |
|
My thought is that In fact, |
|
@chris-b1 : I think we are mostly aligned in our thought process, and I think (if nobody objects) that we should address the issue now in this PR instead of waiting. I don't think adding I agree that we should only allow What do you think? |
|
The question we should ask: what is the usecase of Possible usecases:
So actions I would take:
|
|
@jorisvandenbossche : Well that touches upon #16529, but in any case, it can be used when I agree that we should update the docstring to say that Also, the fix for that bug should be the following: <if target is None or is some primitive>:
inplace = False
...
if not inplace:
return retThere should be no condition for returning anything if How does that sound? @chris-b1 thoughts? |
|
I would suggest
|
|
@chris-b1 : I agree with |
d52e95d to
a59bee6
Compare
a59bee6 to
5bfa90b
Compare
Codecov Report
@@ Coverage Diff @@
## master #16732 +/- ##
==========================================
+ Coverage 90.94% 90.97% +0.02%
==========================================
Files 161 161
Lines 49272 49283 +11
==========================================
+ Hits 44811 44834 +23
+ Misses 4461 4449 -12
Continue to review full report at Codecov.
|
5bfa90b to
d1004eb
Compare
pandas/core/computation/eval.py
Outdated
| # "Primitives" cannot be modified inplace because | ||
| # they have no attributes that we can modify. | ||
| # | ||
| # The only exceptions are lists and dictionaries, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is possible to modify lists -> __getitem__ is used, and it is a string expression, so you can never work with a list
Or do you have an example where this actually works?
(anyway, it would also be good to add some tests for those statements, also for the dictionaries)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see the diff I was looking at is already outdated :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche : This is very much WIP. I'm trying to test that the changes I make actually don't break anything, let alone > 200 tests like last time. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once I have the implementation in a way that works, I will add the tests (I have them in a separate file that I'm running to test before pushing).
pandas/core/computation/eval.py
Outdated
| modifiable = hasattr(target, "__setitem__") | ||
|
|
||
| if inplace and not modifiable: | ||
| raise ValueError("Cannot modify the provided target inplace") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possibility is to just raise depending on parsed_expr.assigner. If there is no assignment in the expression, raise when inplace is set to True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because now you could pass a target, and then set inplace=True (then this error will not be triggered), while this makes no sense when there is no assignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because now you could pass a target, and then set inplace=True
What do you mean? How can you do those two things sequentially?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just using both as keyword arguments: pd.eval(expr, target=df, inplace=True)
that only makes sense when expr contains an assignment, and that you check with parsed_expr.assigner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I think I'll just put a try-except around the item assignment so that we don't have to do all of these checks along the way, as the check itself is not robust to things like list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me the try excepts are not needed. When trying to set an object that does not support assigment with strings like a list, you already get a clear error message (and this is really a corner case, people should actually only pass dataframes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me the try excepts are not needed.
I think a uniform error message is needed. Passing in np.ndarray provides an error message that is not immediately clear regarding item assignment.
people should actually only pass dataframes
Well then perhaps pd.eval should not exist 😄 because as is, there's nothing wrong with me passing in a dictionary or some other dict-like object.
ccb2b58 to
4874b13
Compare
|
@jreback @jorisvandenbossche : Could you run the failing build on Travis? It doesn't seem related to my changes at all whatsoever. |
|
Thanks to whoever reran the build! Everything is green now and ready for review. |
| def eval(expr, parser='pandas', engine=None, truediv=True, | ||
| local_dict=None, global_dict=None, resolvers=(), level=0, | ||
| target=None, inplace=None): | ||
| target=None, inplace=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defer to @jreback, @jorisvandenbossche, but I am still of the opinion that API would be cleaner if inplace was only allowed with a target and raised otherwise. pd.eval('1 + 2', inplace=True) == None is surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chris-b1 : True, but "with a target" is not well-defined either. As you can see by my documentation change on target in pandas.eval, you first need to ensure that item assignment with strings is supported, but how do you check that without modifying the object?
That's something I imagined would create unnecessary overhead. Also, if you go down this path, you should then check if there is item assignment in the first place if you want inplace=True to not surprise anyone.
That's why I decided to allow the user to pass in inplace=True even if there is no real target. I will in fact document that behavior just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said before, I don't think we need to check the modifyability of target. Leave it to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sure, I have no problem leaving it as is.
|
I don't really recall why there is a This seems odd to have |
56ff112 to
3e6cb8c
Compare
| explicitly rather than relying on the default. | ||
| inplace : bool, default False | ||
| If the expression contains an assignment, whether to perform the | ||
| operation inplace and mutate the existing DataFrame. Otherwise, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make the "Otherwise, " -> "By default, " ? (to stress more that returning a new dataframe is the default) Or could also append something to the sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we need to emphasize this. It's pretty clear that the default is False from the declaration of inplace on line 2235.
| expression = "a = 1 + 2" | ||
|
|
||
| with tm.assert_raises_regex(ValueError, msg): | ||
| self.eval(expression, target=invalid_target, inplace=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit a pity that the copy check has to come first, because you now get a not super informative message when passing an invalid target (about not having copy, while you just can't pass it).
(which would be the same if we wouldn't catch those, I know :-))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but I can also pass in objects that are constructed to support item assignment but fail on copy. It seems reasonable to put a custom error message for that case.
pandas/core/computation/eval.py
Outdated
| ------ | ||
| ValueError : There are many instances where such an error can be raised: | ||
| - An invalid target is provided, and | ||
| the expression is multiline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put this on one line? (I think it fits)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further, why the "and the expression is multiline" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meant to go for the condition of multi_line and target=None
pandas/core/computation/eval.py
Outdated
| Raises | ||
| ------ | ||
| ValueError : There are many instances where such an error can be raised: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put the "There are many instances where such an error can be raised:" also on the next line? (and then the colon : is not needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
3e6cb8c to
1c5619f
Compare
|
@jreback @jorisvandenbossche @chris-b1 : PTAL |
1c5619f to
7144144
Compare
|
@jreback @jorisvandenbossche @chris-b1 : Any updates or thoughts on this? |
jreback
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some minor comments.
doc/source/whatsnew/v0.21.0.txt
Outdated
| ^^^^^^^^^^^^^^^^^ | ||
|
|
||
| - Moved definition of ``MergeError`` to the ``pandas.errors`` module. | ||
| - :func:`eval` will now raise a ``ValueError`` in cases where we cannot operate properly on the ``target`` parameter, or when an inplace operation is specified but there is no item assignment in the expression (:issue:`16732`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a little obtuse maybe try to be slightly more verbose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to have a couple of examples in a sub-section on what is happening. Also feel free to expand the Examples section of the doc-string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to clarify the wording a bit. Done.
pandas/core/computation/eval.py
Outdated
| level : int, optional | ||
| The number of prior stack frames to traverse and add to the current | ||
| scope. Most users will **not** need to change this parameter. | ||
| target : a target object for assignment, optional, default is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doc-string is not numpy doc (IOW the expl should be on the next line), specify what the target object can be (DataFrame?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really can be anything that supports string item-assignment, per the docstring I put.
pandas/core/computation/eval.py
Outdated
| WARNING: inplace=None currently falls back to to True, but | ||
| in a future version, will default to False. Use inplace=True | ||
| explicitly rather than relying on the default. | ||
| This is essentially passed into the resolver and is used when there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm essentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes, sense. Done.
| There are many instances where such an error can be raised: | ||
| - `target=None`, but the expression is multiline. | ||
| - The expression is multiline, but not all them have item assignment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expand what this means (for 2nd item)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done.
| try: | ||
| target = env.target.copy() | ||
| except AttributeError: | ||
| raise ValueError("Cannot return a copy of the target") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. There is an entire test devoted to this!
7144144 to
b6bdad3
Compare
|
@jreback @jorisvandenbossche @chris-b1 : Addressed all comments. PTAL. |
doc/source/whatsnew/v0.21.0.txt
Outdated
| ^^^^^^^^^^^^^^^^^ | ||
|
|
||
| - Moved definition of ``MergeError`` to the ``pandas.errors`` module. | ||
| - :func:`eval` will now raise a ``ValueError`` when item assignment malfunctions, or inplace operations are specified, but there is no item assignment in the expression (:issue:`16732`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a small sub-section with an example of what is changing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added it to the breaking section, since technically there is a breaking change, as passing in inplace=True without item assignment is disallowed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice can u make title shorter
b6bdad3 to
2eaa714
Compare
|
@jreback @jorisvandenbossche @chris-b1 : Addressed comment. PTAL. |
| change, the error message is now this: | ||
|
|
||
| .. ipython :: python | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think u need an okexception or something here
(or a code block)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used code block.
doc/source/whatsnew/v0.21.0.txt
Outdated
|
|
||
| .. ipython :: python | ||
|
|
||
| pd.eval("1 + 2", target=arr, inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/source/whatsnew/v0.21.0.txt
Outdated
| ^^^^^^^^^^^^^^^^^ | ||
|
|
||
| - Moved definition of ``MergeError`` to the ``pandas.errors`` module. | ||
| - :func:`eval` will now raise a ``ValueError`` when item assignment malfunctions, or inplace operations are specified, but there is no item assignment in the expression (:issue:`16732`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice can u make title shorter
Deprecated in 0.18.0. xref pandas-devgh-11149. Also patches bug where we were improperly handling the inplace=False condition, as we were assuming that target input was non-None when that wasn't necessarily enforced.
2eaa714 to
2a09efb
Compare
|
@jreback @jorisvandenbossche @chris-b1 : Addressed comments. PTAL. |
|
@jreback @jorisvandenbossche @chris-b1 : Any updates or thoughts on this? |
jreback
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some tiny doc comments. ping when pushed.
| IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) | ||
| and integer or boolean arrays are valid indices | ||
|
|
||
| This is a very long way of saying numpy arrays don't support string-item indexing. With this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't -> doesn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's plural ("arrays"), so "don't" is the correct conjugation there.
doc/source/whatsnew/v0.21.0.txt
Outdated
| Out[4]: 3 | ||
|
|
||
| However, this input does not make much sense because the output is not being assigned to | ||
| the target. Now, a `ValueError` will be raised when such an input is passed in: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double-backticks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
thanks @gfyoung |
Deprecated back in 0.18.0.
xref #11149.
Also patches bug where we were improperly handling the
inplace=Falsecondition, as we were assuming that target input was non-None when that wasn't necessarily enforced.