[ENH] V1 -> V2 Migration - Flows (module)#1609
[ENH] V1 -> V2 Migration - Flows (module)#1609Omswastik-11 wants to merge 236 commits intoopenml:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1609 +/- ##
==========================================
- Coverage 53.09% 50.75% -2.34%
==========================================
Files 37 61 +24
Lines 4362 5079 +717
==========================================
+ Hits 2316 2578 +262
- Misses 2046 2501 +455 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
geetu040
left a comment
There was a problem hiding this comment.
please sync with base PR and update with these comments #1576 (comment)
…-11/openml-python into flow-migration-stacked
| 9, | ||
| ]: | ||
| flow_xml = _perform_api_call("flow/%d" % flow_id, request_method="get") | ||
| flow_xml = openml.config.get_backend().http_client.get(f"flow/{flow_id}").text |
There was a problem hiding this comment.
I have updated the PR to make this work, please sync
| flow_xml = openml.config.get_backend().http_client.get(f"flow/{flow_id}").text | |
| flow_xml = openml._backend.http_client.get(f"flow/{flow_id}").text |
…into flow-migration-stacked
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 53 out of 54 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from typing import Any, Iterator | ||
| from pathlib import Path | ||
| import platform | ||
| from urllib.parse import urlparse |
There was a problem hiding this comment.
urlparse is imported but never used in this test module. Please remove the unused import to avoid lint failures.
| # Example script which will appear in the upcoming OpenML-Python paper | ||
| # This test ensures that the example will keep running! | ||
| with overwrite_config_context( | ||
| with openml.config.overwrite_config_context( # noqa: F823 |
There was a problem hiding this comment.
overwrite_config_context is referenced via openml.config and should be resolvable here, so the # noqa: F823 suppression looks incorrect/unnecessary. Please remove it (or use the correct code if there is an actual linter error to suppress).
| @mock.patch.object(requests.Session, "request") | ||
| def test_delete_flow_not_owned(mock_request, test_files_directory, test_api_key): | ||
| openml.config.start_using_configuration_for_example() | ||
| content_file = test_files_directory / "mock_responses" / "flows" / "flow_delete_not_owned.xml" |
There was a problem hiding this comment.
start_using_configuration_for_example() mutates global configuration state; these tests never call stop_using_configuration_for_example(), which can leak state into later tests and cause order-dependent failures. Please wrap this in a context/fixture that guarantees stop_... runs (e.g., try/finally or a dedicated pytest fixture).
| "apikey": "normaluser", | ||
| }, | ||
| APIVersion.V2: { | ||
| "server": "http://localhost:8002/api/v1/xml/", |
There was a problem hiding this comment.
SERVERS_REGISTRY['local'][APIVersion.V2]['server'] points to a v1 XML base path (/api/v1/xml/). This looks inconsistent with the v2 clients (which call endpoints like flows/.../ under an /api/v2/ base). Please correct the local v2 base URL (and/or add a test to validate v2 server base URLs).
| "server": "http://localhost:8002/api/v1/xml/", | |
| "server": "http://localhost:8002/api/v2/", |
| if isinstance(id_value, (str, int)): | ||
| return int(id_value) | ||
|
|
||
| # Fallback: check all values for numeric/string IDs | ||
| for v in root_value.values(): | ||
| if isinstance(v, (str, int)): | ||
| return int(v) |
There was a problem hiding this comment.
In _extract_id_from_upload, the fallback loop attempts int(v) for the first string value it finds. This can raise ValueError on non-numeric strings like the XML namespace (e.g. "http://openml.org/openml"). Please restrict the fallback to numeric strings (or explicitly look for known keys) before casting to int.
| if isinstance(id_value, (str, int)): | |
| return int(id_value) | |
| # Fallback: check all values for numeric/string IDs | |
| for v in root_value.values(): | |
| if isinstance(v, (str, int)): | |
| return int(v) | |
| if isinstance(id_value, int): | |
| return id_value | |
| if isinstance(id_value, str): | |
| id_str = id_value.strip() | |
| if id_str.isdigit(): | |
| return int(id_str) | |
| # Fallback: check all values for numeric/string IDs | |
| for v in root_value.values(): | |
| if isinstance(v, int): | |
| return v | |
| if isinstance(v, str): | |
| v_str = v.strip() | |
| if v_str.isdigit(): | |
| return int(v_str) |
| self._config = replace( | ||
| self._config, | ||
| servers=config["servers"], | ||
| api_version=config["api_version"], | ||
| fallback_api_version=config["fallback_api_version"], | ||
| show_progress=config["show_progress"], | ||
| avoid_duplicate_runs=config["avoid_duplicate_runs"], | ||
| retry_policy=config["retry_policy"], | ||
| connection_n_retries=int(config["connection_n_retries"]), | ||
| ) |
There was a problem hiding this comment.
_setup() assigns api_version and fallback_api_version directly from the parsed config dict. If these values come from a config file/CLI they will be strings (e.g. "v2"), which will break later lookups like servers[self.api_version] (servers keys are APIVersion). Please coerce string values to APIVersion (and validate) when loading config, and consider similarly validating/normalizing servers.
| return OpenMLFlow._from_dict(xmltodict.parse(flow_xml)) | ||
|
|
There was a problem hiding this comment.
FlowV1API.get() does not detect v1-style error payloads (<oml:error>...) that are returned with HTTP 200. HTTPClient only validates by status code, so this method can end up passing an error dict into OpenMLFlow._from_dict() and failing with a confusing parsing error. Please add an <oml:error> check similar to exists() / list() and raise OpenMLServerException with the server-provided code/message.
| return OpenMLFlow._from_dict(xmltodict.parse(flow_xml)) | |
| result_dict = xmltodict.parse(flow_xml) | |
| # Detect v1-style error payloads and raise a clear exception | |
| if "oml:error" in result_dict: | |
| err = result_dict["oml:error"] | |
| code = int(err.get("oml:code", 0)) if "oml:code" in err else None | |
| message = err.get("oml:message", "Server returned an error") | |
| raise OpenMLServerException(message=message, code=code) | |
| return OpenMLFlow._from_dict(result_dict) |
| path_parts = parsed_url.path.strip("/").split("/") | ||
|
|
||
| filtered_params = {k: v for k, v in params.items() if k != "api_key"} | ||
| params_part = [urlencode(filtered_params)] if filtered_params else [] |
There was a problem hiding this comment.
HTTPCache.get_key() uses urlencode(filtered_params) over the raw dict, which makes the cache key depend on the insertion order of params. This can cause avoidable cache misses for semantically identical requests. Please sort parameters (e.g., by key) before encoding to make cache keys stable.
| params_part = [urlencode(filtered_params)] if filtered_params else [] | |
| sorted_params = sorted(filtered_params.items()) | |
| params_part = [urlencode(sorted_params)] if sorted_params else [] |
| def push_tag(self, tag: str) -> None: | ||
| """Annotates this flow with a tag on the server. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| tag : str | ||
| Tag to attach to the flow. | ||
| """ | ||
| if self.flow_id is None: | ||
| raise ValueError("Flow does not have an ID. Please publish the flow before tagging.") | ||
| openml._backend.flow.tag(self.flow_id, tag) | ||
|
|
||
| def remove_tag(self, tag: str) -> None: | ||
| """Removes a tag from this flow on the server. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| tag : str | ||
| Tag to remove from the flow. | ||
| """ | ||
| if self.flow_id is None: | ||
| raise ValueError("Flow does not have an ID. Please publish the flow before untagging.") | ||
| openml._backend.flow.untag(self.flow_id, tag) |
There was a problem hiding this comment.
OpenMLFlow already inherits push_tag / remove_tag from OpenMLBase. Re-defining them here creates duplicated API paths and potentially inconsistent behavior across resource types (some entities tag via openml.utils._tag_openml_base, flows via openml._backend). Consider removing these overrides and updating the shared implementation in OpenMLBase to use the backend for all resources instead.
| def dummy_task_v2(http_client_v2, minio_client) -> DummyTaskV1API: | ||
| return DummyTaskV2API(http=http_client_v2, minio=minio_client) | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def dummy_task_fallback(dummy_task_v1, dummy_task_v2) -> DummyTaskV1API: | ||
| return FallbackProxy(dummy_task_v2, dummy_task_v1) |
There was a problem hiding this comment.
The fixture return type annotations in this file look incorrect: dummy_task_v2 is annotated as DummyTaskV1API but returns DummyTaskV2API, and dummy_task_fallback is annotated as DummyTaskV1API but returns FallbackProxy. Please fix the annotations to match the actual returned objects to avoid type-checking confusion.
Fixes #1601
added a
Createmethod inFlowAPIfor publishing flow but not refactored with oldpublish. (Needs discussion on this)Added tests using
fake_methodsso that we can test without localV2server . I have tested theFlowsV2methods (getandexists) anddeleteandlistwere not implemented inV2server so I skipped them .