Skip to content

Add inverse boxcox transform functionality to Box Cox Transformer#471

Merged
solegalli merged 9 commits intofeature-engine:mainfrom
SangamSwadiK:add_inv_box_cox_transform
Jun 12, 2022
Merged

Add inverse boxcox transform functionality to Box Cox Transformer#471
solegalli merged 9 commits intofeature-engine:mainfrom
SangamSwadiK:add_inv_box_cox_transform

Conversation

@SangamSwadiK
Copy link
Contributor

@SangamSwadiK SangamSwadiK commented Jun 11, 2022

Reference Issues/PRs:
Issue : #449 (Add inverse Boxcox transform functionality)

What does this implement/fix?
Adds an inverse transform method to Boxcox transform

Please let me know if I missed something / did it incorrectly, ill fix it asap.

@solegalli
Copy link
Collaborator

solegalli commented Jun 12, 2022

Hi @SangamSwadiK

The code looks great. Thank you!

We would have to expand this test to ensure that the inverse_transform returns the desired functionality:

def test_automatically_finds_variables(df_vartypes):
# test case 1: automatically select variables
transformer = BoxCoxTransformer(variables=None)
X = transformer.fit_transform(df_vartypes)
# expected output
transf_df = df_vartypes.copy()
transf_df["Age"] = [9.78731, 10.1666, 9.40189, 9.0099]
transf_df["Marks"] = [-0.101687, -0.207092, -0.316843, -0.431788]
# test init params
assert transformer.variables is None
# test fit attr
assert transformer.variables_ == ["Age", "Marks"]
assert transformer.n_features_in_ == 5
# test transform output
pd.testing.assert_frame_equal(X, transf_df)

Would you be able to do that?

For inspiration you could look at how we do it for the LogTransformer here:

def test_log_base_e_plus_automatically_find_variables(df_vartypes):
# test case 1: log base e, automatically select variables
transformer = LogTransformer(base="e", variables=None)
X = transformer.fit_transform(df_vartypes)
# expected output
transf_df = df_vartypes.copy()
transf_df["Age"] = [2.99573, 3.04452, 2.94444, 2.89037]
transf_df["Marks"] = [-0.105361, -0.223144, -0.356675, -0.510826]
# test init params
assert transformer.base == "e"
assert transformer.variables is None
# test fit attr
assert transformer.variables_ == ["Age", "Marks"]
assert transformer.n_features_in_ == 5
# test transform output
pd.testing.assert_frame_equal(X, transf_df)
# test inverse_transform
Xit = transformer.inverse_transform(X)
# convert numbers to original format.
Xit["Age"] = Xit["Age"].round().astype("int64")
Xit["Marks"] = Xit["Marks"].round(1)
# test
pd.testing.assert_frame_equal(Xit, df_vartypes)

@SangamSwadiK
Copy link
Contributor Author

SangamSwadiK commented Jun 12, 2022

Hi @SangamSwadiK

The code looks great. Thank you!

We would have to expand this test to ensure that the inverse_transform returns the desired functionality:

def test_automatically_finds_variables(df_vartypes):
# test case 1: automatically select variables
transformer = BoxCoxTransformer(variables=None)
X = transformer.fit_transform(df_vartypes)
# expected output
transf_df = df_vartypes.copy()
transf_df["Age"] = [9.78731, 10.1666, 9.40189, 9.0099]
transf_df["Marks"] = [-0.101687, -0.207092, -0.316843, -0.431788]
# test init params
assert transformer.variables is None
# test fit attr
assert transformer.variables_ == ["Age", "Marks"]
assert transformer.n_features_in_ == 5
# test transform output
pd.testing.assert_frame_equal(X, transf_df)

Would you be able to do that?

For inspiration you could look at how we do it for the LogTransformer here:

def test_log_base_e_plus_automatically_find_variables(df_vartypes):
# test case 1: log base e, automatically select variables
transformer = LogTransformer(base="e", variables=None)
X = transformer.fit_transform(df_vartypes)
# expected output
transf_df = df_vartypes.copy()
transf_df["Age"] = [2.99573, 3.04452, 2.94444, 2.89037]
transf_df["Marks"] = [-0.105361, -0.223144, -0.356675, -0.510826]
# test init params
assert transformer.base == "e"
assert transformer.variables is None
# test fit attr
assert transformer.variables_ == ["Age", "Marks"]
assert transformer.n_features_in_ == 5
# test transform output
pd.testing.assert_frame_equal(X, transf_df)
# test inverse_transform
Xit = transformer.inverse_transform(X)
# convert numbers to original format.
Xit["Age"] = Xit["Age"].round().astype("int64")
Xit["Marks"] = Xit["Marks"].round(1)
# test
pd.testing.assert_frame_equal(Xit, df_vartypes)

@solegalli I've added test for inverse transform. Please let me know if changes are required. I'll do it.
Thanks.

Copy link
Collaborator

@solegalli solegalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SangamSwadiK

thanks a lot for the quick turnaround!

This is almost ready. I should have mentioned before that we need to add the inverse_transform to the docstings of the class.

See for example how we bring the documentation forward for the log transformer using {inverse_transform} here:

Methods
-------
{fit}
{fit_transform}
{inverse_transform}
transform:
Transform the variables using the logarithm.

which we import from our base docstrings like we do here:

from feature_engine._docstrings.methods import (
_fit_not_learn_docstring,
_fit_transform_docstring,
_inverse_transform_docstring,
)

So if you could add the 2 lines of code above and also rephrase the sentence in the inverse_transform method that I point to in a following message, then, this is ready to go!!

Thank you so much!


def inverse_transform(self, X: pd.DataFrame) -> pd.DataFrame:
"""
Apply the inverse BoxCox transformation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, for convention we write the following:

Convert the data back to the original representation.

Instead of what you wrote, that makes total sense. It's just that we want to follow scikit-learn conventions.

@SangamSwadiK
Copy link
Contributor Author

Hi @SangamSwadiK

thanks a lot for the quick turnaround!

This is almost ready. I should have mentioned before that we need to add the inverse_transform to the docstings of the class.

See for example how we bring the documentation forward for the log transformer using {inverse_transform} here:

Methods
-------
{fit}
{fit_transform}
{inverse_transform}
transform:
Transform the variables using the logarithm.

which we import from our base docstrings like we do here:

from feature_engine._docstrings.methods import (
_fit_not_learn_docstring,
_fit_transform_docstring,
_inverse_transform_docstring,
)

So if you could add the 2 lines of code above and also rephrase the sentence in the inverse_transform method that I point to in a following message, then, this is ready to go!!

Thank you so much!

@solegalli Sorry, I forgot to update the class docs, I've did the changes now.
Thanks.

@solegalli
Copy link
Collaborator

Thank you @SangamSwadiK !

Merging as we speak :)

@solegalli solegalli linked an issue Jun 12, 2022 that may be closed by this pull request
@solegalli solegalli merged commit 0be8b76 into feature-engine:main Jun 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add inverse_transform method and functionality to BoxCoxTransformer

2 participants