Fix ObjectStoragePath compatible with universal-pathlib 0.3.0 to avoid maximum recursion depth exceeded#56270
Conversation
…d maximum recursion depth exceeded
| shutil.copyfileobj(f1, f2, **kwargs) | ||
|
|
||
| def copy(self, dst: str | ObjectStoragePath, recursive: bool = False, **kwargs) -> None: | ||
| def copy(self, dst: str | ObjectStoragePath, recursive: bool = False, **kwargs) -> None: # type: ignore[override] |
There was a problem hiding this comment.
we normally don't use ignores unless absolutely necessary as ignores don't really fix stuff but just hide the issue.
isn't there a way to fix without ignore?
There was a problem hiding this comment.
tbh i dont have any idea the super class signature is like this:
task-sdk/src/airflow/sdk/io/path.py:297: note: Superclass:
task-sdk/src/airflow/sdk/io/path.py:297: note: def [T: WritablePath] copy(self, target: T, **kwargs: Any) -> T
task-sdk/src/airflow/sdk/io/path.py:297: note: Subclass:
task-sdk/src/airflow/sdk/io/path.py:297: note: def copy(self, dst: str | ObjectStoragePath, recursive: bool = ..., **kwargs: Any) -> None
There was a problem hiding this comment.
okay i removed that now.
There was a problem hiding this comment.
didnt worked mypy still not happy
8f78f58 to
a0415dc
Compare
|
I think it looks okay. assuming |
yeah it looks like it has updates https://github.com/fsspec/universal_pathlib/pull/405/files |
a0415dc to
d9953f4
Compare
|
will be away for whole day please feel free to merge this, mypy still not happy and have added ignore for now. |
I couldn't find it in that pull request. https://github.com/fsspec/universal_pathlib/blob/main/upath/core.py#L1074 still has |
bolkedebruin
left a comment
There was a problem hiding this comment.
Let's remove the unrelated change from the tests or explain why it is required.
sorry what i meant is , the relative_to function implementation changed. in 0.3.0 its returning
in 0.2.6
earlier versions the implementation of relative_to function is in the cloudpath: https://github.com/fsspec/universal_pathlib/pull/405/files#diff-311a53e84d3cff9dd608444fabb9f22eace31a4f35cdd481083d1761296e41cdL63 but its removed from the cloud path instead it is now implemented in core with slight modification that returns So the test failing with the current implementation of
i could change the assertion from |
potiuk
left a comment
There was a problem hiding this comment.
For me the explanation is clear and reasonable.
| conn_id = self.storage_options.get("conn_id") | ||
| if self._protocol and conn_id: | ||
| return f"{self._protocol}://{conn_id}@{self.path}" | ||
| return f"{self._protocol}://{conn_id}@{self.parser.join(*self._raw_urlpaths)}" |
There was a problem hiding this comment.
Is this really the only options? Relying on a private method doesn't feel great.
Could we do this instead:
super(type(self), self).__str__(self.path)
It's much more verbose, and opaque, yes, but it doesn't depend on accessing a "private" method?
Or use self.parts somehow?
There was a problem hiding this comment.
As per your comment below, we should likely restrict 0.3.0 and use this library properly
There was a problem hiding this comment.
I don't know if this matters, but us overloading str like this also likely breaks other aspects of this library (either by causing a recursion error elsewhere, or by including other info that the upath library isn't expecting.
I think we should probably report this as a bug upstream and block 0.3.0 for the moment instead.
For example, self.parts is broken now (and I think this is just 0.3.0:
On 0.2.6:
(Pdb++) self.parts
('/', 'bucket', 'path')
vs 0.3.0
(Pdb++) self.parts
('/', 'bucket', 'ath')
(Pdb++) self
ObjectStoragePath('fake@bucket/path', protocol='fake')
Okay yeah, this is very much an "we are using this library really really wrong":
(Pdb++) CloudPath("s3://bucket/path")
S3Path('bucket/path', protocol='s3')
(Pdb++) CloudPath("s3://bucket/path").parts
('bucket/', 'path')
(Pdb++) upath.__version__
'0.3.0'
|
closing it for now: #56370 |
|
created issue here closing it for now: #56370 |



universal-pathlib version 0.3.0 introduced several changes. One notable update is that the path property now uses str(self) internally https://github.com/fsspec/universal_pathlib/blob/v0.3.0/upath/core.py#L218-L234. This leads to a recursion error in our case because ObjectStoragePath implements a custom str method, and accessing self.path triggers an infinite loop. To resolve this, we should switch to using _raw_urlpaths instead.
https://pypi.org/project/universal-pathlib/
https://github.com/apache/airflow/actions/runs/18141256490/job/51633607047
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.