[ENH] V1 → V2 API Migration - datasets by JATAYU000 · Pull Request #1608 · openml/openml-python

JATAYU000 · 2026-01-08T10:30:37Z

Metadata

Reference Issue: [ENH] V1 → V2 API Migration - datasets #1592
Depends on: [ENH] V1 → V2 API Migration - core structure #1576
Change Log Entry:This PR implements Datasets resource, and refactor its existing functions

…into issue1564

…into pr/1577

codecov-commenter · 2026-01-08T10:36:04Z

Codecov Report

❌ Patch coverage is 69.17586% with 475 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.56%. Comparing base (da993f7) to head (ea80785).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
openml/_api/resources/dataset.py	59.41%	194 Missing ⚠️
openml/_config.py	73.86%	75 Missing ⚠️
openml/_api/clients/http.py	71.68%	62 Missing ⚠️
openml/_api/clients/minio.py	57.14%	30 Missing ⚠️
openml/datasets/dataset.py	30.76%	18 Missing ⚠️
openml/datasets/functions.py	31.57%	13 Missing ⚠️
openml/_api/resources/base/versions.py	84.41%	12 Missing ⚠️
openml/_api/setup/backend.py	80.70%	11 Missing ⚠️
openml/cli.py	0.00%	11 Missing ⚠️
openml/_api/resources/base/base.py	71.42%	10 Missing ⚠️
... and 9 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1608      +/-   ##
==========================================
- Coverage   53.09%   51.56%   -1.53%     
==========================================
  Files          37       61      +24     
  Lines        4362     5360     +998     
==========================================
+ Hits         2316     2764     +448     
- Misses       2046     2596     +550

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JATAYU000 · 2026-01-09T05:08:45Z

FYI @geetu040 Currently the get_dataset() function has 3 download requirement

download_data : uses api_calls._download_minio_bucket() to download all the files in the bucket if download_all_files param was True and api_calls._download_minio_file() to download the dataset.pq file if it was not found in cache. When download parquet fails it fallback to download dataset.arff file with get request
download_features : if feature_file is passed via init it extracts during initialization else does get request and caches the xml
download_qualities : if qulities_file is passed via init it extracts during initialization else does get request and caches the xml

Issues:

The data files .pq and .arff are common for versions and doesn't make sense to be downloaded multiple times
Path handling for download to return the path especially the data files, As mentioned in the meet I can try the Download specific class which uses the cache mixin and only inherited by dataset resource.
Current implementation in OpenMLDataset has v1 specific parsing which in my opinion should be using the current interface (api_context)

Example:

current load_features() ref link
This calls a function which downloads and returns a file path and then parse from the file path
This can be changed by changing that function's definition ref link to get -> parse -> return features instead of file paths

def _get_dataset_features_file(did_cache_dir: str | Path | None, dataset_id: int) -> dict[int, OpenMLDataFeature]:
        return _features

Or by updating the Dataset class to use the underlining interface method from api_context directly.

def _load_features(self) -> None:
       ...
        self._features = api_context.backend.datasets.get_features(self.dataset_id)

Another option is to add return_path to client requests, which in my opinion would be wasteful since adding a param to all the methods of client for just the dataset resource, and that too which could be handled without it as mentioned above.

geetu040

please sync with base PR and update with these comments #1576 (comment)

Copilot

Pull request overview

Copilot reviewed 51 out of 52 changed files in this pull request and generated 10 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

openml/_api/resources/dataset.py

tests/test_api/test_datasets.py

openml/_api/resources/dataset.py

tests/test_datasets/test_dataset_functions.py

tests/test_api/test_datasets.py

Copilot

Pull request overview

Copilot reviewed 51 out of 52 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_api/test_datasets.py

tests/test_evaluations/test_evaluations_example.py

openml/_api/resources/dataset.py

JATAYU000 · 2026-02-26T14:33:11Z

@geetu040 _dataset_file_is_downloaded using cache path to check is description.xml, features.xml, etc
which is being used by multiple tests via _dataset_description_is_downloaded(id) and are failing, What should be done about it?

satvshr and others added 26 commits December 30, 2025 02:07

changes made

bf6e000

set up folder structure and base code

0159f47

Merge branch 'issue1564' of https://github.com/satvshr/openml-python …

8b6af81

…into issue1564

bug fixing

834782c

test failures fix

38ae9be

Update flow_id_tutorial.py

93ab9c2

_defaults bug fixing

aa25dd6

Merge branch 'main' into migration

58e9175

removed __setattr__ given it is not supported

a98b6b1

Merge branch 'main' into issue1564

7c82054

Merge branch 'main' into migration

bdd65ff

Merge branch 'main' into issue1564

f8fbe1e

Merge branch 'main' into issue1564

4fdcb64

Merge branch 'main' into issue1564

b3513f0

fix pre-commit

52ef379

Update all files

146dd21

Update introduction_tutorial.py

7a67bf0

Merge base migration pr, ruff

f7ba710

refactor

5dfcbce

implement cache_dir

2acbe99

refactor

af99880

Merge branch 'main' into pr/1577

b111905

Merge branch 'issue1564' of https://github.com/satvshr/openml-python …

83f36c2

…into pr/1577

Merge branch 'main' into pr/1576

74ab366

edit, fork, delete updated

8964517

Added features, updated list

1c2fa99

JATAYU000 added 2 commits January 9, 2026 10:49

Merge commit pull/1576 into dataset_resource

18e85de

Refactor functions, except get

9bcbcb3

geetu040 added 12 commits February 25, 2026 13:02

update set_api_version for fallback

7318573

minor fix

29ef187

fixes for test_config

cf94c89

fixes in conftest urls

298fbda

update test_http.py

9870502

undo changes with test_openml_cache_dir_env_var

33065c2

fix server mode in test_config.py

76b92bb

move _HEADERS to confing

419edcb

add fixtures for migration tests

cb6d937

update test_http.py with fixtures

8544c8a

update test_versions.py

d4c413b

update test_versions.py

fab1a15

geetu040 suggested changes Feb 25, 2026

View reviewed changes

JATAYU000 and others added 7 commits February 25, 2026 18:01

Merge base-migration

6392be8

fix error message in HTTPClient.server

276324a

fixes in test_versions.py: use DummyTaskAPI instead of TaskAPI

73f7594

add clients in openml._backend

2ee7fa3

skip parquet env var

4f37607

Merge base-migration

c74754a

Updated test,admin fixture

2473208

Copilot AI review requested due to automatic review settings February 26, 2026 10:11

Copilot started reviewing on behalf of JATAYU000 February 26, 2026 10:11 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

JATAYU000 added 3 commits February 26, 2026 15:57

code qulity Reviews

7afb0e3

Test fixes

3b96559

remove unnecessary

ea80785

Copilot AI review requested due to automatic review settings February 26, 2026 11:26

Copilot started reviewing on behalf of JATAYU000 February 26, 2026 11:26 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

tests/test_api/test_datasets.py Show resolved Hide resolved

tests/test_evaluations/test_evaluations_example.py Show resolved Hide resolved

openml/_api/resources/dataset.py Show resolved Hide resolved

Uh oh!

Conversation

JATAYU000 commented Jan 8, 2026

Metadata

Uh oh!

codecov-commenter commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JATAYU000 commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issues:

Example:

Uh oh!

geetu040 left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JATAYU000 commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

codecov-commenter commented Jan 8, 2026 •

edited

Loading

JATAYU000 commented Jan 9, 2026 •

edited

Loading