Skip to content

Conversation

@robinst
Copy link
Collaborator

@robinst robinst commented Apr 26, 2024

This adds an API for users/extensions to extend inline parsing (or override some built-in inline parsing), similar to the current support for custom block parsers. Fixes:

See also:

Overview

Since 5790505 we've internally extracted some of the inline parsing logic. This PR turns that into an API. It's currently exported in the beta package because it might be subject to change (and Scanner, which this depends on are is in beta).

The entry point for this is Parser.Builder#customInlineContentParserFactory.

The registered factory needs to provide one or more "trigger characters". When such a character is encountered during inline parsing, the parser is called with a Scanner and it can parse inline content from there. It can decide not to return a result; in that case other parsers (including built-in ones) are tried next.

Example

See InlineContentParserTest for an example of how to use it. Registering the factory:

var parser = Parser.builder().customInlineContentParserFactory(new DollarInlineParser.Factory()).build();

And the factory and parser:

private static class DollarInlineParser implements InlineContentParser {
private int index = 0;
@Override
public ParsedInline tryParse(InlineParserState inlineParserState) {
var scanner = inlineParserState.scanner();
scanner.next();
var pos = scanner.position();
var end = scanner.find('$');
if (end == -1) {
return ParsedInline.none();
}
var content = scanner.getSource(pos, scanner.position()).getContent();
scanner.next();
return ParsedInline.of(new DollarInline(content, index++), scanner.position());
}
static class Factory implements InlineContentParserFactory {
@Override
public Set<Character> getTriggerCharacters() {
return Set.of('$');
}
@Override
public InlineContentParser create() {
return new DollarInlineParser();
}
}
}

Design considerations

The trickiest part was deciding on the lifetime of the parser instance. There were a few options:

  • The parser takes an instance (instead of factory) which is used for all parsed documents. That would mean the instance has to be stateless, as it could be called concurrently from multiple parses. Care has been taken to make using a single parser from multiple threads safe, so this would go against that.
  • The parser takes a factory, and for each parsed document, one instance is created. While this solves the above problem, there is the following quote in the spec: "Note that the first step requires processing lines in sequence, but the second can be parallelized, since the inline parsing of one block element does not affect the inline parsing of any other." If we only had a single parser instance per document, we wouldn't be able to parallelize inline parsing in the future. There's also another problem: If an instance wants to keep state, it would probably be per inline snippet. In order to allow that, we would need to add some kind of "reset" method to signal it that a snippet has finished.
  • The parser takes a factory, and for each parsed inline snippet, an instance is created. This would allow parallelizing inline parsing, and is also a useful granularity if the parser wants to keep state. Note that e.g. link parsing is not yet using this mechanism, but with this, it could. (This is the option that was chosen.)
  • The parser takes a factory, and for each trigger character, an instance is created. This seems too small of a scope, and wouldn't allow the instance to keep any state across multiple parses.

Some other thoughts:

  • Why a Set<Character> for trigger characters instead of just a single char? If link parsing was implemented on top of this, it would require at least [ and ] as trigger characters, so let's allow multiple. There's not really a downside to allowing multiple.
  • Could a parser return more than a single Node? Not with the current API, but we could add that in a backwards-compatible way.
  • Why another Factory? Not sure what the alternative would be, and we already have the same concept for BlockParserFactory :).

@robinst robinst merged commit 3733963 into main Apr 26, 2024
@robinst robinst deleted the robinst-inline-content-parser branch April 26, 2024 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants