[ENH] Simplified Unified Get/List API by Omswastik-11 · Pull Request #1552 · openml/openml-python

Omswastik-11 · 2025-12-23T18:49:28Z

Redundant syntax for getters

In getters, syntax is repeated and redundant, mainly through
the submodule having to be imported or addressed.

import openml

# List all datasets and their properties
openml.datasets.list_datasets(output_format="dataframe")

# Get dataset by ID
dataset = openml.datasets.get_dataset(61)

# Get dataset by name
dataset = openml.datasets.get_dataset('Fashion-MNIST')

# This is similar for flows, runs, studies, such as

study = openml.studies.get_study(42)
flow = openml.flows.get_flows(42)

API implementation

import openml

# List all datasets
datasets_df = openml.list_all("dataset", output_format="dataframe")

# Get dataset by ID
dataset = openml.get("dataset", 61)

# Get dataset by name
dataset = openml.get("dataset", "Fashion-MNIST")

# Get task
task = openml.get("task", 31)

# Get flow
flow = openml.get("flow", 10)

# Get run
run = openml.get("run", 20)

# Shortcut: infer dataset from name when no type specified
dataset = openml.get("Fashion-MNIST")

Implementation Details

Added openml.list_all(object_type: str, **kwargs) -> Any, a dispatcher that forwards to:
- list_datasets
- list_tasks
- list_flows
- list_runs
Added openml.get(object_type_or_name, identifier=None, **kwargs) -> Any, a unified getter with support for:
- Type-based lookup
```
openml.get("dataset", 61)
openml.get("dataset", "dataset_name")
```
- Name-only shortcut for datasets
```
openml.get("Fashion-MNIST")
```
Exported both functions via __all__ and documented them with docstrings.
Preserved full backward compatibility:
- Existing submodule APIs (e.g., openml.datasets.get_dataset) remain unchanged.
Added unit tests to validate dispatcher behavior without requiring network access.

openml/__init__.py

codecov-commenter · 2025-12-30T22:08:13Z

Codecov Report

❌ Patch coverage is 50.00000% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.12%. Comparing base (7feb2a3) to head (59bd74f).

Files with missing lines	Patch %	Lines
openml/dispatchers.py	48.14%	14 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1552      +/-   ##
==========================================
- Coverage   52.82%   52.12%   -0.70%     
==========================================
  Files          37       38       +1     
  Lines        4371     4399      +28     
==========================================
- Hits         2309     2293      -16     
- Misses       2062     2106      +44

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…l-python into prototype-api

openml/__init__.py

for more information, see https://pre-commit.ci

Copilot

Pull request overview

This PR introduces a simplified unified API for getting and listing OpenML objects through two new dispatcher functions: openml.get() and openml.list_all(). These functions provide a more concise alternative to the existing submodule-specific APIs (e.g., openml.datasets.get_dataset()) while maintaining full backward compatibility. The implementation uses dispatch dictionaries to route calls to the appropriate underlying functions.

Changes:

Added unified get() and list_all() dispatcher functions for datasets, tasks, flows, and runs
Updated example tutorials to demonstrate the new API with legacy alternatives shown in comments
Added unit tests for the dispatcher functions using mocking

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
openml/dispatchers.py	New module containing the unified `get()` and `list_all()` dispatcher functions with type checking and error handling
openml/init.py	Exports the new dispatcher functions and module; contains duplicate entries in `__all__` list
tests/test_openml/test_openml.py	Unit tests for dispatchers covering datasets and tasks; missing coverage for flows, runs, and error conditions
examples/Basics/simple_tasks_tutorial.py	Updated to demonstrate new `openml.get()` API with legacy alternative in comments
examples/Basics/simple_flows_and_runs_tutorial.py	Updated to demonstrate both `openml.list_all()` and `openml.get()` APIs
examples/Basics/simple_datasets_tutorial.py	Updated to demonstrate both `openml.list_all()` and `openml.get()` APIs
examples/Advanced/tasks_tutorial.py	Updated multiple examples to demonstrate new `openml.list_all()` API

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-26T10:56:57Z

openml/dispatchers.py

+    Raises
+    ------
+    ValueError
+        If object_type is not one of the supported types.


The docstring for get states it raises ValueError for unsupported object types, but it also raises TypeError when object_type is not a string (line 98-99). Add TypeError to the "Raises" section of the docstring to accurately document this behavior.

Copilot · 2026-02-26T10:56:58Z

openml/__init__.py

+    "__version__",
+    "_api_calls",
    "_api_calls",


Duplicate entries in the __all__ list. Lines 113 and 114 duplicate "version" and "_api_calls" which are already present on lines 112 and 115 respectively. Remove the duplicates on lines 113 and 114.

Suggested change

"__version__",

"_api_calls",

"_api_calls",

"_api_calls",

Copilot · 2026-02-26T10:56:58Z

openml/dispatchers.py

+def get(identifier: int | str, *, object_type: str = "dataset", **kwargs: Any) -> Any:
+    """Get an OpenML object by identifier.
+
+    Parameters
+    ----------
+    identifier : int | str
+        The ID or name of the object to retrieve.
+    object_type : str, default="dataset"
+        The type of object to get. Must be one of 'dataset', 'task', 'flow', 'run'.
+    **kwargs : Any
+        Additional arguments passed to the underlying get function.
+
+    Returns
+    -------
+    Any
+        The requested OpenML object.
+
+    Raises
+    ------
+    ValueError
+        If object_type is not one of the supported types.
+
+    Examples
+    --------
+    >>> openml.get(61)  # Get dataset 61 (default object_type="dataset")
+    >>> openml.get("Fashion-MNIST")  # Get dataset by name
+    >>> openml.get(31, object_type="task")  # Get task 31
+    >>> openml.get(10, object_type="flow")  # Get flow 10
+    >>> openml.get(20, object_type="run")  # Get run 20
+    """
+    if not isinstance(object_type, str):
+        raise TypeError(f"object_type must be a string, got {type(object_type).__name__}")
+
+    func = _GET_DISPATCH.get(object_type.lower())
+    if func is None:
+        valid_types = ", ".join(repr(k) for k in _GET_DISPATCH)
+        raise ValueError(
+            f"Unsupported object_type {object_type!r}; expected one of {valid_types}.",
+        )
+
+    return func(identifier, **kwargs)


The PR description shows an API like openml.get("dataset", 61) and openml.list_all("dataset", output_format="dataframe"), but the actual implementation has get(identifier, *, object_type="dataset", **kwargs) and list_all(object_type, /, **kwargs). This means the correct usage is openml.get(61) (with default object_type="dataset") or openml.get(31, object_type="task"), not openml.get("dataset", 61). The PR description should be updated to accurately reflect the implemented API, or the implementation should be changed to match the description.

Copilot · 2026-02-26T10:56:58Z

tests/test_openml/test_openml.py

+            get_dataset_mock.assert_called_with("Fashion-MNIST")
+
+            openml.get(31, object_type="task")
+            get_task_mock.assert_called_with(31)


Test coverage is incomplete. While the dispatcher supports "flow" and "run" object types, the tests only cover "dataset" and "task". Add test cases for openml.list_all("flow"), openml.list_all("run"), openml.get(flow_id, object_type="flow"), and openml.get(run_id, object_type="run") to ensure all supported object types are tested.

Suggested change

get_task_mock.assert_called_with(31)

get_task_mock.assert_called_with(31)

def test_list_dispatch_flow_and_run(self):

flow_list_mock = mock.Mock()

run_list_mock = mock.Mock()

# Need to patch after import, so update dispatch dict for flow and run

with mock.patch.dict(

"openml.dispatchers._LIST_DISPATCH",

{

"flow": flow_list_mock,

"run": run_list_mock,

},

):

openml.list_all("flow")

flow_list_mock.assert_called_once_with()

openml.list_all("run", size=10)

run_list_mock.assert_called_once_with(size=10)

def test_get_dispatch_flow_and_run(self):

get_flow_mock = mock.Mock()

get_run_mock = mock.Mock()

# Need to patch after import, so update dispatch dict for flow and run

with mock.patch.dict(

"openml.dispatchers._GET_DISPATCH",

{

"flow": get_flow_mock,

"run": get_run_mock,

},

):

openml.get(5, object_type="flow")

get_flow_mock.assert_called_once_with(5)

openml.get(7, object_type="run")

get_run_mock.assert_called_once_with(7)

Copilot · 2026-02-26T10:56:59Z

openml/dispatchers.py

+    identifier : int | str
+        The ID or name of the object to retrieve.


The get function accepts identifier: int | str but only get_dataset supports string identifiers (dataset names). The other get functions (get_task, get_flow, get_run) only accept integer IDs. This means calling openml.get("some-name", object_type="task") would fail with a confusing error. Consider either: (1) documenting that only datasets support string identifiers, or (2) adding validation to raise a clear error when a string identifier is used with non-dataset object types, or (3) restricting the type signature based on object_type.

Copilot · 2026-02-26T10:56:59Z

tests/test_openml/test_openml.py

+            get_dataset_mock.assert_called_with("Fashion-MNIST")
+
+            openml.get(31, object_type="task")
+            get_task_mock.assert_called_with(31)


The tests don't verify error handling for invalid inputs. Add test cases to verify that list_all and get raise appropriate errors for: (1) invalid object types (e.g., openml.list_all("invalid_type")), (2) non-string object types (e.g., openml.list_all(123)), and (3) other edge cases. This ensures the validation logic in lines 55-63 and 98-106 of openml/dispatchers.py is working correctly.

Suggested change

get_task_mock.assert_called_with(31)

get_task_mock.assert_called_with(31)

def test_list_all_invalid_object_type(self):

# Invalid string object type should raise an error

with self.assertRaises((ValueError, KeyError, RuntimeError)):

openml.list_all("invalid_type")

# Non-string object type should raise a type-related error

with self.assertRaises((TypeError, ValueError)):

openml.list_all(123) # type: ignore[arg-type]

def test_get_invalid_object_type(self):

# Invalid string object type should raise an error

with self.assertRaises((ValueError, KeyError, RuntimeError)):

openml.get(1, object_type="invalid_type")

# Non-string object type should raise a type-related error

with self.assertRaises((TypeError, ValueError)):

openml.get(1, object_type=123) # type: ignore[arg-type]

Copilot · 2026-02-26T10:56:59Z

openml/dispatchers.py

+        The result from the type-specific list function (typically a DataFrame).
+
+    Raises
+    ------


The docstring for list_all states it raises ValueError for unsupported object types, but it also raises TypeError when object_type is not a string (line 55-56). Add TypeError to the "Raises" section of the docstring to accurately document this behavior.

Suggested change

------

------

TypeError

If object_type is not a string.

geetu040

Copilot left a few solid review comments above, please take a look and address them.

improved the Getter API for users

bb4554f

Omswastik-11 marked this pull request as ready for review December 24, 2025 10:22

Omswastik-11 changed the title ~~[ENH] improved the Getter API for users~~ [ENH] Simplified Unified Get/List API Dec 24, 2025