Conversation
- Use start, stop, step terms - Make RangeIndex.__init__ private and more flexible, add RangeIndex.arange and RangeIndex.linspace public factories - General support of RangeIndex slicing - RangeIndex.isel with arbitrary 1D values: convert to PandasIndex - Add RangeIndex.to_pandas_index
... when check_default_indexes=False.
|
I've made further progress on this. Some design questions (thoughts welcome!): Create a new RangeIndex
import xarray as xr
from xarray.indexes import RangeIndex
index = RangeIndex.arange("x", "x", 0.0, 1.0, 0.1)
ds = xr.Dataset(coords=xr.Coordinates.from_xindex(index))
Index importShould we expose all public built-in Xarray indexes at the top level? Or only at the Currently the |
|
Note: this Xarray RangeIndex is designed for floating value ranges. For integer ranges it is probably best to use a PandasIndex wrapping a |
xarray/indexes/range_index.py
Outdated
| dim : str | ||
| Dimension name. | ||
| start : float, optional | ||
| Start of interval (default: 0.0). The interval includes this value. |
There was a problem hiding this comment.
Could consider adding a closed kwarg like pd.Interval, but in a future PR of course.
| "`Coordinates.from_xindex()`" | ||
| ) | ||
|
|
||
| @property |
There was a problem hiding this comment.
Can these all be cached_property?
There was a problem hiding this comment.
Would there be much benefit of caching those simple aliases to attributes of the underlying transform?
Make `dim` a required keyword argument and `coord_name` an optional keyword argument (defaults to `dim`).
| dtype : dtype, optional | ||
| The dtype of the coordinate variable (default: float64). | ||
|
|
||
| Examples |
There was a problem hiding this comment.
| Examples | |
| Note that all `start`, `stop` & `step` must be passed, which is more explicit than `np.arange` or `range` | |
| Examples |
(optional, no strong view)
There was a problem hiding this comment.
Note that all
start,stop&stepmust be passed
This isn't exactly true, but yes the API here is more explicit than np.arange and range, e.g., RangeIndex.arange(10.0) means start=10 while np.arange(10.0) means stop=10.
RangeIndex.arange(10.0) doesn't make much sense, though, considering the default value of stop=1.0. I'll see if we can get closer to np.arange using tpying.overload.
There was a problem hiding this comment.
RangeIndex.arange(10.0)doesn't make much sense, though, considering the default value of stop=1.0. I'll see if we can get closer tonp.arangeusingtpying.overload.
yeah. no objection to the more explicit approach — it's useful-but-a-bit-magic that arange / range changes the meaning of the first arg based on how many are supplied
There was a problem hiding this comment.
Mimicing numpy.arange behavior is surprisingly difficult! (at least for me, I've been struggling with this).
I got it close with some simple logic, but then I hit the same issue than numpy/numpy#17878 (i.e., RangeIndex.arange(start=10) returns a range in the [0, 10) interval, which makes no sense). I could fix it after some heavy refactor making the code / API ugly to a point I'm not sure I want to push it here :).
Numpy relies on the Python C API PyArg_ParseTupleAndKeywords() but AFAIK there seem to be no easy way to know from Python whether a value has been passed as positional or keyword argument.
There was a problem hiding this comment.
!!
I think the explicit approach is very valid, but maybe we just call it out / ensure people need to pass kwargs where it could be confusing
There was a problem hiding this comment.
In 4a128d0 I've chosen to mimicking numpy and pandas API anyway, with some simple logic and clearly documenting the caveat above.
I find myself (likely others too) doing range(10) or np.arange(10.0) so many times while I doubt many will write something like np.arange(start=10).
pandas.RangeIndex(start=10) actually still returns RangeIndex(start=0, stop=10, step=1) and I haven't seen anyone complaining in the pandas issues (or I missed it).
|
Hope it's alright to chime in here.
One use case for E.g. say you have a regular grid with (I understand for integer ranges one wants a |
|
@wpbonelli I don't think this is well documented, but you could do: da = xr.DataArray(np.zeros((5, 10)), dims=("rows", "cols"))
da.coords["r"] = ("rows", pd.RangeIndex(da.sizes["rows"]))
da = da.set_xindex("r")
da
# <xarray.DataArray (rows: 5, cols: 10)> Size: 400B
# array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
# [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
# [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
# [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
# [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
# Coordinates:
# * r (rows) int64 40B 0 1 2 3 4
# Dimensions without coordinates: rows, cols
da.xindexes["r"]
# PandasIndex(RangeIndex(start=0, stop=5, step=1, name='r'))This relies on the fact that Xarray internally keeps track of the A more explicit way of achieving the same result: r_index = PandasIndex(pd.RangeIndex(da.sizes["rows"], name="r"), dim="rows")
da = da.assign_coords(xr.Coordinates.from_xindex(r_index)) |
With one caveat (also in pandas): `RangeIndex.arange(4.0)` creates an index within the range [0.0, 4.0) (`start` is interpreted as `stop`). This caveat is documented.
|
This is ready for another round of review! I don't think that CI failures are related to anything in this PR. |
|
@benbovy thanks! sorry to hijack the thread. |
* main: (76 commits) Update how-to-add-new-backend.rst (#10240) Support extension array indexes (#9671) Switch documentation to pydata-sphinx-theme (#8708) Bump codecov/codecov-action from 5.4.0 to 5.4.2 in the actions group (#10239) Fix mypy, min-versions CI, xfail Zarr tests (#10255) Remove `test_dask_layers_and_dependencies` (#10242) Fix: Docs generation create temporary files that are not cleaned up. (#10238) opendap / dap4 support for pydap backend (#10182) Add RangeIndex (#10076) Fix mypy (#10232) Fix doctests (#10230) Fix broken Sphinx Roles (#10225) `DatasetView.map` fix `keep_attrs` (#10219) Add datatree repr asv (#10214) CI: Automatic PR labelling is back (#10201) Fixes dimension order in `xarray.Dataset.to_stacked_array` (#10205) Fix references to core classes in docs (#10207) Update pre-commit hooks (#10208) add `scipy-stubs` as extra `[types]` dependency (#10202) Fix sparse dask repr test (#10200) ...
* main: (76 commits) Update how-to-add-new-backend.rst (pydata#10240) Support extension array indexes (pydata#9671) Switch documentation to pydata-sphinx-theme (pydata#8708) Bump codecov/codecov-action from 5.4.0 to 5.4.2 in the actions group (pydata#10239) Fix mypy, min-versions CI, xfail Zarr tests (pydata#10255) Remove `test_dask_layers_and_dependencies` (pydata#10242) Fix: Docs generation create temporary files that are not cleaned up. (pydata#10238) opendap / dap4 support for pydap backend (pydata#10182) Add RangeIndex (pydata#10076) Fix mypy (pydata#10232) Fix doctests (pydata#10230) Fix broken Sphinx Roles (pydata#10225) `DatasetView.map` fix `keep_attrs` (pydata#10219) Add datatree repr asv (pydata#10214) CI: Automatic PR labelling is back (pydata#10201) Fixes dimension order in `xarray.Dataset.to_stacked_array` (pydata#10205) Fix references to core classes in docs (pydata#10207) Update pre-commit hooks (pydata#10208) add `scipy-stubs` as extra `[types]` dependency (pydata#10202) Fix sparse dask repr test (pydata#10200) ...
whats-new.rstapi.rstWork in progress (justReady for review (copied and adapted the example from #9543 (comment)).