Skip to content

[ENH] Simplified Publish API with Automatic Type Recognition#1554

Open
Omswastik-11 wants to merge 14 commits intoopenml:mainfrom
Omswastik-11:prototype-publish
Open

[ENH] Simplified Publish API with Automatic Type Recognition#1554
Omswastik-11 wants to merge 14 commits intoopenml:mainfrom
Omswastik-11:prototype-publish

Conversation

@Omswastik-11
Copy link
Contributor

@Omswastik-11 Omswastik-11 commented Dec 24, 2025

initially

from openml_sklearn.extension import SklearnExtension
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=3)
extension = SklearnExtension()# User instantiates the extension object
knn_flow = extension.model_to_flow(clf) # User manually converts the model (estimator instance) to an OpenMLFlow object
knn_flow.publish()

API

from sklearn.neighbors import KNeighborsClassifier
import openml_sklearn  # Register the extension
import openml

clf = KNeighborsClassifier(n_neighbors=3)

openml.publish(clf)

@Omswastik-11 Omswastik-11 changed the title [ENH] improve publish api for users [ENH] Simplified Publish API with Automatic Type Recognition Dec 24, 2025
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this is a draft still, some early comments.

  • works for flows only, I would recommend to try for at least two different object types to see the dispatching challenge there.
  • do the extension checking inside publish and not in the usage example

@Omswastik-11
Copy link
Contributor Author

Omswastik-11 commented Dec 25, 2025

Thanks @fkiraly !!
I checked on flow , datset , task . it is working correctly but in run it is getting some server side issues.

Task 1 failed: https://test.openml.org/api/v1/xml/data/features/1 returned code 274: No features found. Additionally, dataset processed with error - None

@Omswastik-11 Omswastik-11 requested a review from fkiraly December 25, 2025 08:07
@jgyasu
Copy link
Contributor

jgyasu commented Dec 31, 2025

The PR description is not entirely correct. This is how the interface looks currently:

from openml_sklearn.extension import SklearnExtension
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=3)
extension = SklearnExtension()# User instantiates the extension object
knn_flow = extension.model_to_flow(clf) # User manually converts the model (estimator instance) to an OpenMLFlow object
knn_flow.publish()

But I like the idea of a unified publish. I am currently working on a design document for refactoring Extension and this design coincides mine as well, which is a good thing.

@Omswastik-11
Copy link
Contributor Author

The PR description is not entirely correct. This is how the interface looks currently:

from openml_sklearn.extension import SklearnExtension
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=3)
extension = SklearnExtension()# User instantiates the extension object
knn_flow = extension.model_to_flow(clf) # User manually converts the model (estimator instance) to an OpenMLFlow object
knn_flow.publish()

But I like the idea of a unified publish. I am currently working on a design document for refactoring Extension and this design coincides mine as well, which is a good thing.

Thanks for the correction I used the syntax example used in example script . this unified publish was Franz's idea . https://github.com/gc-os-ai/openml-project-dev/issues/8

@Omswastik-11 Omswastik-11 marked this pull request as ready for review January 1, 2026 11:07
@codecov-commenter
Copy link

codecov-commenter commented Jan 6, 2026

Codecov Report

❌ Patch coverage is 28.00000% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.00%. Comparing base (7feb2a3) to head (54d92e2).

Files with missing lines Patch % Lines
openml/publish.py 25.00% 18 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1554      +/-   ##
==========================================
- Coverage   52.82%   52.00%   -0.83%     
==========================================
  Files          37       38       +1     
  Lines        4371     4396      +25     
==========================================
- Hits         2309     2286      -23     
- Misses       2062     2110      +48     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jgyasu
Copy link
Contributor

jgyasu commented Jan 13, 2026

I have added some comments. I also feel we should not populate __init__.py with these functions, we can have them in a seperate file and use __init__.py only for imports.

@Omswastik-11 Omswastik-11 requested a review from jgyasu January 13, 2026 15:00
Copilot AI review requested due to automatic review settings February 26, 2026 10:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new top-level openml.publish() helper to simplify publishing by automatically recognizing whether the input is an OpenML object or an external estimator handled by an installed extension (e.g., openml-sklearn).

Changes:

  • Introduces openml/publish.py with a unified publish(obj, name=..., tags=...) entry point and tag/name merging behavior.
  • Exposes publish from openml.__init__ and updates the basic flows/runs tutorial to demonstrate the simplified publishing path.
  • Adds unit tests covering publishing an OpenMLBase instance and publishing via the extension registry.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
openml/publish.py Implements the new unified publish API with automatic routing and tag/name handling.
openml/__init__.py Exports publish at the package top-level and updates __all__.
tests/test_openml/test_openml.py Adds tests validating publish behavior for OpenML objects and extension-backed models.
examples/Basics/simple_flows_and_runs_tutorial.py Demonstrates automatic publishing vs manual flow construction in the tutorial.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +112 to 114
"__version__",
"_api_calls",
"_api_calls",
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__all__ contains duplicated entries for __version__ and _api_calls. This is likely an accidental copy/paste and can cause confusing docs / from openml import * behavior. Remove the duplicate strings so each symbol appears only once in __all__.

Suggested change
"__version__",
"_api_calls",
"_api_calls",
"_api_calls",

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +94
model = object()
flow_id = openml.publish(model, name="n", tags=["x"])

get_ext_mock.assert_called_once_with(model, raise_if_no_extension=True)
ext_instance.model_to_flow.assert_called_once_with(model)
assert flow_mock.name == "n"
assert flow_mock.tags == ["x"]
flow_mock.publish.assert_called_once_with()
assert flow_id == "flow-id"
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test mocks flow.publish() to return a string and then asserts openml.publish(model, ...) returns that string (and names the variable flow_id). In the real codebase, OpenMLBase.publish() and OpenMLFlow.publish() return self, so openml.publish() should return the published flow object, not an ID. Update the test to reflect the actual return contract (e.g., have flow_mock.publish return flow_mock and assert the returned object is the flow).

Copilot uses AI. Check for mistakes.
Comment on lines 55 to 91
@@ -77,6 +86,9 @@
knn_flow.publish()
print(f"knn_flow was published with the ID {knn_flow.flow_id}")

# %% [markdown]
# Now we'll use the auto-published flow to create and upload a run.

Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this tutorial, both “Option A” and “Option B” are executed sequentially, which will upload/publish two flows and then overwrite knn_flow. The later text says it will use the auto-published flow, but the code that follows actually uses the manually created flow. Consider making only one option executable (or use different variables like knn_flow_auto/knn_flow_manual) to avoid double uploads and keep the narrative consistent.

Copilot uses AI. Check for mistakes.
geetu040

This comment was marked as duplicate.

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with most of the core logic, just have to double-check the extension work and tests.
There are some reviews by Copilot you would want to address.
I've left a comment for discussion.

from .base import OpenMLBase


def publish(obj: Any, *, name: str | None = None, tags: Sequence[str] | None = None) -> Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for adding name and tags as a parameter to publish? I'm not necessarily against it, I would actually prefer it this way, but I am just curious about the motivation or use case behind this. Are there other parameters we can introduce like these? possibly the common attributes between resources that are used in _to_dict to create artifacts for upload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants