Skip to content

Review identifier encoding#171

Merged
Pfeil merged 34 commits intomainfrom
review-identifier-encoding
Aug 21, 2024
Merged

Review identifier encoding#171
Pfeil merged 34 commits intomainfrom
review-identifier-encoding

Conversation

@Pfeil
Copy link
Member

@Pfeil Pfeil commented Aug 2, 2024

I reviewed the code as I was going through it anyway to check if #5 was completed.

  • I marked some methods as deprecated as they are either not necessary or do not really do what the documentation says. They are not needed internally anymore anyway.
  • I added some tests and tried to improve their documentation/readability.
  • I added a fallback in the encoding function in case our "readable encoding" solution is not enough. I am not sure why it was removed, maybe as we had no tests for it.
  • I renamed the class to IdentifierUtils to make it somehow more obvious to more users what this util class is about.

Missing:

  • Check the test coverage
  • Should encoding/decoding really throw exceptions or can we fully rely on things like optional?
    • I removed it in one place, as the exception is thown in another place anyway and the specification of the function did neither define the exception, nor was it required due to a present optional which could be used in the context of the javadoc definition.
  • Do we need exceptions in builder patterns? (setters / adders)
    • If needed, we could make e.g. setPathThrowing(path) or something as optional setters to throw exceptions.
  • make sure setters and adders are not being confused in DataEntity (and other builders?)
  • Make sure all setId functions and builders are being consistent regarding encoding: where do we encode/decode?
    • we encode in setId
    • we decode in one situation in the reader only, where we really use the path. I think this is fine?
  • Make sure we have integration tests having encoded URIs (pretty sure we have them already)
  • Review commits regarding Require all direct properties of the root data entity #4
  • Breaking changes are in main already anyway, so we can directly merge into main.

@coveralls
Copy link

coveralls commented Aug 2, 2024

Pull Request Test Coverage Report for Build #223

Details

  • 89 of 96 (92.71%) changed or added relevant lines in 10 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.5%) to 90.394%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/main/java/edu/kit/datamanager/ro_crate/entities/AbstractEntity.java 4 5 80.0%
src/main/java/edu/kit/datamanager/ro_crate/special/IdentifierUtils.java 16 17 94.12%
src/main/java/edu/kit/datamanager/ro_crate/entities/data/DataEntity.java 13 15 86.67%
src/main/java/edu/kit/datamanager/ro_crate/RoCrate.java 29 32 90.63%
Files with Coverage Reduction New Missed Lines %
src/main/java/edu/kit/datamanager/ro_crate/RoCrate.java 1 93.6%
Totals Coverage Status
Change from base Build #207: 0.5%
Covered Lines: 1562
Relevant Lines: 1728

💛 - Coveralls

@Pfeil Pfeil self-assigned this Aug 10, 2024
Pfeil added 13 commits August 14, 2024 11:54
And add some documentation.
The tests failed partially. I had to decide if the methods should only accept encoded paths. I decided otherwise.
There was no failing test, this is just to make it a bit more robust.
Just a simple heuristic, obviously.
It failed in some cases which are accepted by the specification,
but could have been url-encoded, technically.
This way all functionality fits inside and is well-distinguishable to other "UriUtils" classes.
@Pfeil Pfeil force-pushed the review-identifier-encoding branch from d59e67c to 0f99041 Compare August 14, 2024 09:58
@Pfeil
Copy link
Member Author

Pfeil commented Aug 14, 2024

We should not make our library (and the tests) dependent on the operating system. So I removed OS dependent code. The error had its cause in the Java 17 default build setup, though. More info on this is in the comments below.

@Pfeil
Copy link
Member Author

Pfeil commented Aug 14, 2024

From the log:

D:\a\ro-crate-java\ro-crate-java\src\test\java\edu\kit\datamanager\ro_crate\special\IdentifierUtilsTest.java:20: error: unmappable character (0x9D) for encoding windows-1252

    static final String FILE_CHINESE = "�?�试.mp4";

and from a windows machine:

expected: <https://example.com/break§ stuff + code<>/> but was: <https://example.com/break§ stuff + code<>/>

and

expected: <https://example.com/break%C2%A7%20stuff%20%2B%20code%3C%3E/> but was: <https://example.com/break%C3%82%C2%A7%20stuff%20%2B%20code%3C%3E/>

Note that the strings are not modified after git checkout, it happens in the build process somewhere!

@Pfeil Pfeil force-pushed the review-identifier-encoding branch from 8802fde to 3ae1488 Compare August 15, 2024 10:43
@Pfeil Pfeil force-pushed the review-identifier-encoding branch from 3ae1488 to 079cfda Compare August 15, 2024 11:13
@Pfeil
Copy link
Member Author

Pfeil commented Aug 15, 2024

The issue is that Java 17 on Windows will re-encode your code and therefore can cause artifacts in strings before compiling your code. Upgrading to Java 21 should help, or adding

if (JavaVersion.current() == JavaVersion.VERSION_17) {
    println "Setting encoding to UTF-8 manually"
    compileJava.options.encoding = "UTF-8"
    compileTestJava.options.encoding = "UTF-8"
}

to build.gradle. For now, I did both to support both of them.

Also, I added to the CI the following step. On manually triggered runs (not repeating an automatically triggered one!), it will upload the test results as artifacts:

    - name: Upload (test) reports as artifact on GitHub on manual runs
      if: github.event_name == 'workflow_dispatch'
      uses: actions/upload-artifact@v4
      with:
        name: test-report ${{ matrix.os }} JDK ${{ matrix.jdk }}
        path: build/reports

@Pfeil Pfeil force-pushed the review-identifier-encoding branch from 7c0ef98 to 9340c3e Compare August 16, 2024 10:32
@Pfeil
Copy link
Member Author

Pfeil commented Aug 16, 2024

(the last "WIP" commit is being considered a stash for later continuation)

Note: the DataEntity builder now had methods which throw exceptions even though this can be handled more quietly in most cases. It is currently unknown if someone really needs it, but if so, we can keep them as extra methods, this is fine.

Still, I would prefer to keep the old behaviour of deriving the ID from the filename if it is not set already. SetID can be used afterwards if a custom name shall be given. This fits better into the builder pattern and makes sure a user does not get confused by first using setId and then overriding it with the filepath call which takes and sets the ID again. It also shrinks the line length.

I'll add some TODOs about this at the top, respectively.

@Pfeil Pfeil force-pushed the review-identifier-encoding branch from 4ba8fb4 to 83abb62 Compare August 19, 2024 13:13
@Pfeil Pfeil marked this pull request as ready for review August 19, 2024 13:46
@Pfeil Pfeil added this to the v2.0.0 milestone Aug 19, 2024
@Pfeil Pfeil marked this pull request as draft August 19, 2024 14:00
@Pfeil
Copy link
Member Author

Pfeil commented Aug 19, 2024

I just saw issue #4 has also been worked on. I'd like to take a look this one before I merge, too.

@Pfeil Pfeil marked this pull request as ready for review August 21, 2024 13:53
@Pfeil
Copy link
Member Author

Pfeil commented Aug 21, 2024

I added some setters and changed some constructors in order to make things simpler and more flexible. Otherwise, just some cleanup. License entities now get added by default if more than an ID is provided, so the user will not forget this.

@Pfeil Pfeil marked this pull request as draft August 21, 2024 14:05
@Pfeil Pfeil marked this pull request as ready for review August 21, 2024 14:05
@Pfeil Pfeil merged commit a72497f into main Aug 21, 2024
@Pfeil Pfeil mentioned this pull request Aug 21, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants