[MNT] Complete test for sparse dataset by omkar-334 · Pull Request #1654 · openml/openml-python

omkar-334 · 2026-02-18T18:50:44Z

Added coverage for sparse dataset get_data when row_id and ignore attributes are included.
Split behavior into two tests: test_get_sparse_dataset_excludes_rowid (checks 19998 columns) and test_get_sparse_dataset_includes_rowid (checks 20000 columns).
Updated test names to make include vs exclude behavior clear.

Copilot

Pull request overview

This PR addresses issue #1644 by improving test coverage for sparse datasets with row_id and ignore attributes. The original test only verified behavior when these attributes were excluded; this PR adds a complementary test to verify behavior when they are included.

Changes:

Renamed test_get_sparse_dataset_rowid_and_ignore_and_target to test_get_sparse_dataset_excludes_rowid to clarify that it tests exclusion behavior
Added new test test_get_sparse_dataset_includes_rowid that verifies correct behavior when row_id and ignore attributes are included in the output
Both tests verify correct shape (19998 vs 20000 columns) and categorical indicators

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-18T18:54:05Z

tests/test_datasets/test_dataset.py


-    def test_get_sparse_dataset_rowid_and_ignore_and_target(self):
-        # TODO: re-add row_id and ignore attributes
+    def test_get_sparse_dataset_excludes_rowid(self):


The test name test_get_sparse_dataset_excludes_rowid is imprecise. This test excludes both the row_id attribute AND the ignore attribute (as specified on lines 383-384 and the parameters on lines 387-388). Consider renaming to test_get_sparse_dataset_excludes_rowid_and_ignore to match the naming pattern used in the non-sparse test test_get_data_rowid_and_ignore_and_target on line 192.

Suggested change

def test_get_sparse_dataset_excludes_rowid(self):

def test_get_sparse_dataset_excludes_rowid_and_ignore(self):

Copilot · 2026-02-18T18:54:05Z

tests/test_datasets/test_dataset.py

        self.assertListEqual(categorical, [False] * 19998)
        assert y.shape == (600,)

+    def test_get_sparse_dataset_includes_rowid(self):


The test name test_get_sparse_dataset_includes_rowid is imprecise. This test includes both the row_id attribute AND the ignore attribute (as specified on lines 400-401 and the parameters on lines 404-405). Consider renaming to test_get_sparse_dataset_includes_rowid_and_ignore to be clearer about what's being tested and to maintain consistency with the exclude test naming.

Suggested change

def test_get_sparse_dataset_includes_rowid(self):

def test_get_sparse_dataset_includes_rowid_and_ignore(self):

Copilot · 2026-02-18T18:54:05Z

tests/test_datasets/test_dataset.py

+        assert len(categorical) == 20000
+        self.assertListEqual(categorical, [False] * 20000)
+        assert y.shape == (600,)
+


Consider adding test coverage for partial inclusion scenarios. Currently, the tests only cover the cases where both row_id and ignore attributes are either excluded together or included together. The API supports four combinations: (1) both excluded, (2) both included, (3) include row_id but exclude ignore, and (4) include ignore but exclude row_id. Adding tests for scenarios (3) and (4) would provide more comprehensive coverage of the get_data method's behavior with sparse datasets.

Suggested change

def test_get_sparse_dataset_includes_rowid_excludes_ignore(self):

self.sparse_dataset.ignore_attribute = ["V256"]

self.sparse_dataset.row_id_attribute = ["V512"]

X, y, categorical, _ = self.sparse_dataset.get_data(

target="class",

include_row_id=True,

include_ignore_attribute=False,

)

assert all(dtype == pd.SparseDtype(np.float32, fill_value=0.0) for dtype in X.dtypes)

assert isinstance(y.dtypes, pd.SparseDtype)

assert X.shape == (600, 19999)

assert len(categorical) == 19999

self.assertListEqual(categorical, [False] * 19999)

assert y.shape == (600,)

def test_get_sparse_dataset_excludes_rowid_includes_ignore(self):

self.sparse_dataset.ignore_attribute = ["V256"]

self.sparse_dataset.row_id_attribute = ["V512"]

X, y, categorical, _ = self.sparse_dataset.get_data(

target="class",

include_row_id=False,

include_ignore_attribute=True,

)

assert all(dtype == pd.SparseDtype(np.float32, fill_value=0.0) for dtype in X.dtypes)

assert isinstance(y.dtypes, pd.SparseDtype)

assert X.shape == (600, 19999)

assert len(categorical) == 19999

self.assertListEqual(categorical, [False] * 19999)

assert y.shape == (600,)

@geetu040 do you think we need the other 2 scenarios as well?

omkar-334 · 2026-02-18T18:57:29Z

[MNT] Complete test for sparse dataset

12be649

Copilot AI review requested due to automatic review settings February 18, 2026 18:50

Merge branch 'main' into sparseds

6a010b5

Copilot started reviewing on behalf of omkar-334 February 18, 2026 18:51 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MNT] Complete test for sparse dataset#1654

[MNT] Complete test for sparse dataset#1654
omkar-334 wants to merge 2 commits intoopenml:mainfrom
omkar-334:sparseds

omkar-334 commented Feb 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

omkar-334 Feb 18, 2026

Uh oh!

omkar-334 commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def test_get_sparse_dataset_excludes_rowid(self):
	def test_get_sparse_dataset_excludes_rowid_and_ignore(self):

	def test_get_sparse_dataset_includes_rowid(self):
	def test_get_sparse_dataset_includes_rowid_and_ignore(self):

+    def test_get_sparse_dataset_includes_rowid_excludes_ignore(self):
+        self.sparse_dataset.ignore_attribute = ["V256"]
+        self.sparse_dataset.row_id_attribute = ["V512"]
+        X, y, categorical, _ = self.sparse_dataset.get_data(
+            target="class",
+            include_row_id=True,
+            include_ignore_attribute=False,
+        )
+        assert all(dtype == pd.SparseDtype(np.float32, fill_value=0.0) for dtype in X.dtypes)
+        assert isinstance(y.dtypes, pd.SparseDtype)
+        assert X.shape == (600, 19999)
+        assert len(categorical) == 19999
+        self.assertListEqual(categorical, [False] * 19999)
+        assert y.shape == (600,)
+    def test_get_sparse_dataset_excludes_rowid_includes_ignore(self):
+        self.sparse_dataset.ignore_attribute = ["V256"]
+        self.sparse_dataset.row_id_attribute = ["V512"]
+        X, y, categorical, _ = self.sparse_dataset.get_data(
+            target="class",
+            include_row_id=False,
+            include_ignore_attribute=True,
+        )
+        assert all(dtype == pd.SparseDtype(np.float32, fill_value=0.0) for dtype in X.dtypes)
+        assert isinstance(y.dtypes, pd.SparseDtype)
+        assert X.shape == (600, 19999)
+        assert len(categorical) == 19999
+        self.assertListEqual(categorical, [False] * 19999)
+        assert y.shape == (600,)

Uh oh!

Conversation

omkar-334 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

omkar-334 Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

omkar-334 commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

omkar-334 commented Feb 18, 2026 •

edited

Loading