Skip to content

Conversation

@luccadibe
Copy link

@luccadibe luccadibe commented Dec 1, 2025

This PR aims to add new tests to the HDF5Readers using new test input datasets generated with a new R script. It is meant as a first step for SYSTEMDS-3929 .

The existing tests used three datasets which were commited , for example src/test/scripts/functions/io/hdf5/in/transfusion_1.h5 , I could not find a generator script in the codebase for these.

In this PR, the test .h5 files generated with the script include datasets with different schemas ( 2d,3d,4d) and datatypes (doubles, integers, strings) and is a table-driven test in a single file. All data is generated using R and the library "rhdf5" , which was the one already being used before for validation.

The test loads the dataset with systemds and R and verifies that the outputs match using TestUtils.compareMatrices .

Currently, all tests fail , due to some message types (11 and 12) not being supported by the HDF5 implementation in systemds.
Message 11 is the Filter Pipeline Message
Message 12 is the Attribute Message

These messages seem to be applied by default by rhdf5 version 2.54.0 .

I have worked on fixes to all of the bugs, enough so to be able to load the So2Sat LCZ42 dataset .
I can open different PRs for the fixes if it is desired.
The generator generates many test datasets, some of which are out of scope to test for.
For example, the chunked / compressed datasets. The test currently does not use them since they are out of scope for my project; however I think they can be useful to keep improving the support of HDF5 in systemds.

@luccadibe luccadibe marked this pull request as ready for review January 5, 2026 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant