Add HDF5 comprehensive tests #2369
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR aims to add new tests to the HDF5Readers using new test input datasets generated with a new R script. It is meant as a first step for SYSTEMDS-3929 .
The existing tests used three datasets which were commited , for example src/test/scripts/functions/io/hdf5/in/transfusion_1.h5 , I could not find a generator script in the codebase for these.
In this PR, the test .h5 files generated with the script include datasets with different schemas ( 2d,3d,4d) and datatypes (doubles, integers, strings) and is a table-driven test in a single file. All data is generated using R and the library "
rhdf5" , which was the one already being used before for validation.The test loads the dataset with systemds and R and verifies that the outputs match using
TestUtils.compareMatrices.Currently, all tests fail , due to some message types (11 and 12) not being supported by the HDF5 implementation in systemds.
Message 11 is the Filter Pipeline Message
Message 12 is the Attribute Message
These messages seem to be applied by default by
rhdf5version 2.54.0 .I have worked on fixes to all of the bugs, enough so to be able to load the So2Sat LCZ42 dataset .
I can open different PRs for the fixes if it is desired.
The generator generates many test datasets, some of which are out of scope to test for.
For example, the chunked / compressed datasets. The test currently does not use them since they are out of scope for my project; however I think they can be useful to keep improving the support of HDF5 in systemds.