Skip to content

Conversation

@VictorVerhaert
Copy link

Adds a new experimental combine_cubes process that aims at taking away complexity from merge_cubes.

The two process can now each have their distinct usage: one for performing operations that have two cubes as input and one for merging data with a possible overlap.

Related issue: #280

m-mohr and others added 28 commits May 25, 2023 13:30
…other processes. Default to numerical index instead of string. (Open-EO#478)
* `filter_spatial`: Clarified that a masking get applied for the given geometries. Open-EO#469
* `filter_bbox`: Clarified that the bounding box is reprojected to the CRS of the spatial data cube dimensions if required.

---------

Co-authored-by: Stefaan Lippens <[email protected]>
* divide, ln, log, mod: Clarified behavior for 0 input / infinity results

* Trigonometric functions: Clarified that NaN is returned outside of their defined ranges and the output value range for some processes

* Clarified for various mathematical functions the defined input and output ranges. Mention that `NaN` is returned outside of the defined input range where possible.

* Remove NaN
* Add `export_collection`, `export_workspace`, `stac_update`;
`save_results` returns the STAC resource instead of boolean `true` Open-EO/openeo-api#376

* Update stac_update/modify

* Added details about STAC support.

* Update meta/implementation.md

Co-authored-by: Matthias Mohr <[email protected]>

---------

Co-authored-by: Michele Claus <[email protected]>
@m-mohr
Copy link
Member

m-mohr commented Jun 2, 2025

Looks pretty good at first look.

I'm wondering whether the process parameter needs to be required?

Is this process meant to be merge cubes but only return the overlap or is this process something else? I feel the issue was asking for the first and the process is slightly different from it ("combine").
The combine term seems a bit too close to merge though if it's the first option. It would be great if we could more easily distinguish what the processes do. If we'd start from scratch, we may name them merge_cubes_union and merge_cubes_intersection, but that's too late. Maybe we could call this process intersect_cubes or so?

@VictorVerhaert
Copy link
Author

The initial draft of this PR was indeed more with a goal of a forced overlap_resolver in mind, but not necessarily only the intersection. This is a usecase I often come across which indirectly leads to the overlap-mode/intersection mode in the issue but goes a bit wider.
With a combine_cubes you could implement an intersection but it allows for more, e.g. a left join or A - B where A=0 if A == nodata. The main feature being that a process is forced on both inputs and let the user handle no-data in the provided process.

As for the naming, perhaps it would be clearer to lend the term binary_operation from math definitions. Pandas uses the terms merge and combine as well which might make it intuitive for users, or we could use join as used in SQL.
Brainstorm:

  • apply_binary
  • apply_binary_cubes
  • apply_binary_cube_operation
  • apply_binary_process
  • apply_cubes (dangerous imo)
  • apply_on_overlap
  • apply_on_intersection
  • combine_binary_cubes
  • join_cubes (personally not a fan but including for brainstorm sakes)
  • unite_cubes (ambiguous with merge_cubes)
  • conflate_cubes
  • fuse_cubes (ambiguous with merge_cubes)

@soxofaan
Copy link
Member

soxofaan commented Jun 3, 2025

I'm wondering whether the process parameter needs to be required?

Yes I think that's the core feature of this process:
the user wants to combine cube1 and cube2 with an explicit operation (add values, take difference, calculate ratio, ....).

It's like apply_dimension or reduce_dimension but instead of working on values along a dimension of a single cube, you work with values from two different cubes.
And to make that work, your cubes have to be "aligned" properly, which practically means we only work on the intersection and drop all the rest.

Is this process meant to be merge cubes but only return the overlap or is this process something else? I feel the issue was asking for the first and the process is slightly different from it ("combine").

So it's a bit the other way around: working on the intersection is the consequence of requiring an operation to combine the cubes

@soxofaan
Copy link
Member

FYI another user support issue that, after hours of digging, again turned out out to be about unexpected results from merge_cubes with an unused overlap resolver:
https://forum.dataspace.copernicus.eu/t/inconsistent-rbr-pixel-values-using-openeo-sentinel-2-data/3804

@VictorVerhaert
Copy link
Author

I was thinking about making it clear(er) in the description that band names should be equal as well (or the cubes should not have a band dimension) as I anticipate that users might oversee these labels.
The python client implementation should also check this and throw a warning if there are no overlapping band names.

@clausmichele
Copy link
Member

Hello! This might be interesting for me too, did you implement it at VITO @VictorVerhaert @soxofaan ?
I'm facing strange issues with merge_cubes which might require a different process, see cloudinsar/s1-workflows#67

@soxofaan
Copy link
Member

soxofaan commented Nov 7, 2025

no we didn't implement this yet

if I understand your issue cloudinsar/s1-workflows#67 well, I think you still want the "classic" merge_cubes there as you want to fuse partially incomplete cubes to get the "union".
combine_cubes is for doing calculations with values from the intersection of cubes, and discard everything outside of the intersection

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants