[ENH] Simplified Publish API with Automatic Type Recognition#1554
[ENH] Simplified Publish API with Automatic Type Recognition#1554Omswastik-11 wants to merge 14 commits intoopenml:mainfrom
Conversation
fkiraly
left a comment
There was a problem hiding this comment.
I get this is a draft still, some early comments.
- works for flows only, I would recommend to try for at least two different object types to see the dispatching challenge there.
- do the extension checking inside
publishand not in the usage example
|
Thanks @fkiraly !! |
|
The PR description is not entirely correct. This is how the interface looks currently: from openml_sklearn.extension import SklearnExtension
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=3)
extension = SklearnExtension()# User instantiates the extension object
knn_flow = extension.model_to_flow(clf) # User manually converts the model (estimator instance) to an OpenMLFlow object
knn_flow.publish()But I like the idea of a unified |
Thanks for the correction I used the syntax example used in example script . this unified publish was Franz's idea . https://github.com/gc-os-ai/openml-project-dev/issues/8 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1554 +/- ##
==========================================
- Coverage 52.82% 52.00% -0.83%
==========================================
Files 37 38 +1
Lines 4371 4396 +25
==========================================
- Hits 2309 2286 -23
- Misses 2062 2110 +48 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I have added some comments. I also feel we should not populate |
There was a problem hiding this comment.
Pull request overview
Adds a new top-level openml.publish() helper to simplify publishing by automatically recognizing whether the input is an OpenML object or an external estimator handled by an installed extension (e.g., openml-sklearn).
Changes:
- Introduces
openml/publish.pywith a unifiedpublish(obj, name=..., tags=...)entry point and tag/name merging behavior. - Exposes
publishfromopenml.__init__and updates the basic flows/runs tutorial to demonstrate the simplified publishing path. - Adds unit tests covering publishing an
OpenMLBaseinstance and publishing via the extension registry.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
openml/publish.py |
Implements the new unified publish API with automatic routing and tag/name handling. |
openml/__init__.py |
Exports publish at the package top-level and updates __all__. |
tests/test_openml/test_openml.py |
Adds tests validating publish behavior for OpenML objects and extension-backed models. |
examples/Basics/simple_flows_and_runs_tutorial.py |
Demonstrates automatic publishing vs manual flow construction in the tutorial. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "__version__", | ||
| "_api_calls", | ||
| "_api_calls", |
There was a problem hiding this comment.
__all__ contains duplicated entries for __version__ and _api_calls. This is likely an accidental copy/paste and can cause confusing docs / from openml import * behavior. Remove the duplicate strings so each symbol appears only once in __all__.
| "__version__", | |
| "_api_calls", | |
| "_api_calls", | |
| "_api_calls", |
| model = object() | ||
| flow_id = openml.publish(model, name="n", tags=["x"]) | ||
|
|
||
| get_ext_mock.assert_called_once_with(model, raise_if_no_extension=True) | ||
| ext_instance.model_to_flow.assert_called_once_with(model) | ||
| assert flow_mock.name == "n" | ||
| assert flow_mock.tags == ["x"] | ||
| flow_mock.publish.assert_called_once_with() | ||
| assert flow_id == "flow-id" |
There was a problem hiding this comment.
This test mocks flow.publish() to return a string and then asserts openml.publish(model, ...) returns that string (and names the variable flow_id). In the real codebase, OpenMLBase.publish() and OpenMLFlow.publish() return self, so openml.publish() should return the published flow object, not an ID. Update the test to reflect the actual return contract (e.g., have flow_mock.publish return flow_mock and assert the returned object is the flow).
| @@ -77,6 +86,9 @@ | |||
| knn_flow.publish() | |||
| print(f"knn_flow was published with the ID {knn_flow.flow_id}") | |||
|
|
|||
| # %% [markdown] | |||
| # Now we'll use the auto-published flow to create and upload a run. | |||
|
|
|||
There was a problem hiding this comment.
In this tutorial, both “Option A” and “Option B” are executed sequentially, which will upload/publish two flows and then overwrite knn_flow. The later text says it will use the auto-published flow, but the code that follows actually uses the manually created flow. Consider making only one option executable (or use different variables like knn_flow_auto/knn_flow_manual) to avoid double uploads and keep the narrative consistent.
geetu040
left a comment
There was a problem hiding this comment.
I am fine with most of the core logic, just have to double-check the extension work and tests.
There are some reviews by Copilot you would want to address.
I've left a comment for discussion.
| from .base import OpenMLBase | ||
|
|
||
|
|
||
| def publish(obj: Any, *, name: str | None = None, tags: Sequence[str] | None = None) -> Any: |
There was a problem hiding this comment.
What's the reason for adding name and tags as a parameter to publish? I'm not necessarily against it, I would actually prefer it this way, but I am just curious about the motivation or use case behind this. Are there other parameters we can introduce like these? possibly the common attributes between resources that are used in _to_dict to create artifacts for upload.
initially
API