Skip to content
This repository was archived by the owner on Jun 16, 2020. It is now read-only.

First steps towards Read trait parser (BrTable)#216

Closed
Robbepop wants to merge 9 commits intobytecodealliance:masterfrom
Robbepop:robin-refactor-br-table
Closed

First steps towards Read trait parser (BrTable)#216
Robbepop wants to merge 9 commits intobytecodealliance:masterfrom
Robbepop:robin-refactor-br-table

Conversation

@Robbepop
Copy link
Contributor

@Robbepop Robbepop commented May 17, 2020

First step towards implementing https://github.com/bytecodealliance/wasmparser/issues/90.

This refactoring of BrTable is required in order to drop the explicit lifetime from the Operator enum. We introduce the WasmBrTable and WasmBrTableBuilder traits so that we can later make Operator generic over user provided types for BrTable and maybe other custom Wasm operators and types in the future.

We separated BrTable and BrTableBuilder in order to make the more important BrTable less constraint regarding incremental building. At the moment we do not provide users with a way to customize the Operator enum so we might experience slight performance regressions since a BrTable now is eagerly constructed instead of deferring its construction.

In the long run this procedure should even improve performance since we no longer need to parse the br_table targets twice. Also in some cases we needed to construct an internal br_table and users would then construct their own custom br_table on their side. With this (and follow-up) changes we simply directly construct the user defined br_table directly.

Next Steps

After merging of this PR the next steps to make the parser work on T: Read instead of working on a slice of [u8] are:

  • Maybe we want to allow constructing Result<Self::BrTable> in WasmBrTableBuilder trait to allow for use cases where a user doesn't want to support br_table at all.
  • Find any other locations where a lifetime is used in some produced data structure as it was the case for Operator and replace it with similar technique.
  • Introduce collective trait that can be used to pass to the parser to tell it which user defined types to construct. The user has to implement this trait for some super type to enumerate all the user defined types. So far this is only needed for BrTable but this design is scalable for future additions.
  • Change API to use T: Read.

Benchmarks

Sadly I cannot test the performance implications since the benchmark harness seems to be broken on master. At least I get the following output on cargo bench with RUST_BACKTRACE=1:

Benchmarking it works benchmark: Warming up for 3.0000 sthread 'main' panicked at 'unexpected error Bad magic number (at offset 0)', benches/benchmark.rs:38:42
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:78
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:59
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1069
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1537
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:62
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:49
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:198
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:218
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:477
  11: rust_begin_unwind
             at src/libstd/panicking.rs:385
  12: std::panicking::begin_panic_fmt
             at src/libstd/panicking.rs:339
  13: criterion::Bencher<M>::iter
  14: <criterion::routine::Function<M,F,T> as criterion::routine::Routine<M,T>>::warm_up
  15: criterion::routine::Routine::sample
  16: criterion::analysis::common
  17: criterion::benchmark_group::BenchmarkGroup<M>::bench_function
  18: criterion::Criterion<M>::bench_function
  19: benchmark::main
  20: std::rt::lang_start::{{closure}}
  21: std::rt::lang_start_internal::{{closure}}
             at src/libstd/rt.rs:52
  22: std::panicking::try::do_call
             at src/libstd/panicking.rs:297
  23: std::panicking::try
             at src/libstd/panicking.rs:274
  24: std::panic::catch_unwind
             at src/libstd/panic.rs:394
  25: std::rt::lang_start_internal
             at src/libstd/rt.rs:51
  26: main
  27: __libc_start_main
  28: _start
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

error: bench failed

Doc Screenshots

2020-05-17-161328_1711x1199_scrot
2020-05-17-161314_1505x1220_scrot
2020-05-17-161253_1500x885_scrot

@Robbepop Robbepop changed the title Robin refactor br table First steps towards Read trait parser (BrTable) May 17, 2020
@Robbepop Robbepop mentioned this pull request May 17, 2020
@Robbepop Robbepop marked this pull request as ready for review May 17, 2020 14:28
@yurydelendik
Copy link
Collaborator

It is a good start, though I don't think we want Operator to have BrTables structure. The requirements to not hold dynamic data comes from #16.

How about we will start with generalizing the Operator as:

enum WasmOperator<B: OperatorBuilder> { Unreachable, ..., BrTable { table: B::BrTable }, ...  };
type Operator<'a> = WasmOperator<DefaultOperatorBuilder<'a>>;

impl<'a> OperatorBuilder for DefaultOperatorBuilder<'a> {
  type BrTable = BrTable<'a>;
  fn read_table(&self) -> Self::BrTable { ... }
}

(I did not play with the above idea yet, so I'll consider the alternative approaches)

@Robbepop
Copy link
Contributor Author

Robbepop commented May 18, 2020

enum WasmOperator<B: OperatorBuilder> { Unreachable, ..., BrTable { table: B::BrTable }, ...  };
type Operator<'a> = WasmOperator<DefaultOperatorBuilder<'a>>;

impl<'a> OperatorBuilder for DefaultOperatorBuilder<'a> {
  type BrTable = BrTable<'a>;
  fn read_table(&self) -> Self::BrTable { ... }
}

Well the

enum WasmOperator<B: OperatorBuilder> { Unreachable, ..., BrTable { table: B::BrTable }, ...  };

is basically what I proposed above as a follow-up PR.
I just didn't want to do everything at once. However, I think we might want to get rid of the lifetime in the default Operator for the sake of making it workable for a T: Read parser (maybe I am wrong that this contradicts each other tbh) which is why I opted for the Box<[u32]> approach by default.

Why exactly is a no_std dependency of alloc bad?
I see that we want to get rid of explicit drop calls for Operator.
I think we could maybe even provide a stack-based BrTable via SmallVec or some other approach. But by shifting the problem of allocation to the user side we already won something imo.

Also we do not necessarily have to provide a default BrTable or Operator from within wasmparser. Instead we could rely on the user to provide us with one.

To elaborate on the problem with T: Read parsing: If you have a parser that does not operate on a slice of bytes but instead one that operates on a T: Read then you cannot have a subslice into a buffer anymore which is how the issue describes an iterator based solution for a non allocating BrTable. So I see a conflict there of either getting T: Read parsing or a never allocating default BrTable. This approach would at least provide a user the chance to opt-into some other form of BrTable that might not allocate and thus partially solves this conflict.

Another solution for a non-allocating default BrTable might be one where we have a better Read trait that allows to split its input stream so that it can have multiple consumers with different states. However, every possible solution for this would probably require dynamic allocations again.

@yurydelendik
Copy link
Collaborator

yurydelendik commented May 18, 2020

Also we do not necessarily have to provide a default BrTable or Operator from within wasmparser. Instead we could rely on the user to provide us with one.

Correct. The OperatorsReader, get_operators_reader and other friends will become generic to do that. Though legacy ParserState code will continue use default Operator.

We can add (New)BrTable as you proposed in the PR, though I'm hesitant to remove (Old)BrTable<'a> just yet. Let me think about the consequences for no_std too.

@yurydelendik
Copy link
Collaborator

To elaborate on the problem with T: Read parsing...
you cannot have a subslice into a buffer anymore which is how the issue describes an iterator based solution for a non allocating BrTable

Right. I don't think only BrTable will be an issue. We need a specific plan to remove/generalize offset: usize, data: &'a [u8] from all "readers/" files before we start a code churn. 🤔

@Robbepop
Copy link
Contributor Author

Robbepop commented May 18, 2020

Yep, the best solution I can currently think of is shift all hard problems (allocation) to user side which is exactly what we will be able to do with the help of those new traits. In the long run we do not even want to provide a default BrTable so the (New)BrTable is basically just a placeholder until we reach the ultimate goal. I prefer doing baby steps when it comes to such refactorings. ;)

@alexcrichton
Copy link
Member

This repository has now moved to a subfolder within https://github.com/bytecodealliance/wasm-tools. This current repository will be marked as archived in the future after we've sorted out active PRs, so if you'd like to still merge this then it will need to be opened as a new PR against the new repository. If you have any trouble in doing so, though, please let me know and I can try to help out. now

@Robbepop
Copy link
Contributor Author

Thanks @alexcrichton for letting me know.
I personally would like to get this merged so that we can tackle the other follow-up tasks but it mainly depends on @yurydelendik 's opinion on this strategy to solve the underlying problems.

@Robbepop
Copy link
Contributor Author

Robbepop commented May 27, 2020

Thanks @alexcrichton for letting me know.
I personally would like to get this merged so that we can tackle the other follow-up tasks but it mainly depends on @yurydelendik 's opinion on this strategy to solve the underlying problems.

Sorry for the bump but have you made up your mind about this @yurydelendik ?
I'd love to help solve these problems for the wasmparser crate and currently have more time to do so.

@yurydelendik
Copy link
Collaborator

Sorry for the bump but have you made up your mind about this @yurydelendik ?

I tried to find time to experiment with other side of this puzzle by myself -- got occupied with other stuff. What you are proposing makes sense, since we already had this in the code base few years ago.

I like the idea of a T: Read and would like to explore it further. Do you have any other pilot code for this idea I can look at? I wonder if it can be done with less BrTable code churn -- we have many libraries depend on wasmparser.

@Robbepop
Copy link
Contributor Author

Robbepop commented May 28, 2020

What you are proposing makes sense, since we already had this in the code base few years ago.

Wow this is interesting to know. Can you point me to some commit where and why this way changed to what it currently is?

Do you have any other pilot code for this idea I can look at? I wonder if it can be done with less BrTable code churn -- we have many libraries depend on wasmparser.

Currently this is the initial PR and work I did for this. I have not yet looked into what else is going to be needed to make T: Read parsing possible but I am confident that it is possible to implement this. However, I wouldn't want to do this in a single big refactoring but rather incrementally because with a big-fat refactoring chances are too high that we will simply never finish. You are right though that this incremental change might incur lots of friction to our dependencies so this is a serious downside I haven't looked yet into much to be honest.

I currently have time to experiment a little bit more with this idea and will probably look into what else is needed in order to get T: Read without losing performance or giving up on other requirements.

Another funny idea I had for the BrTable was to split it up into its basic components. E.g. you wouldn't have the br_table operator with its varying size but instead would have a br_table_head(len) followed by some br_table_target instances and finally followed by some br_table_default. However, this obviously just shifts the problem of construction of such a BrTable.

@yurydelendik
Copy link
Collaborator

yurydelendik commented Jun 1, 2020

Can you point me to some commit where and why this way changed to what it currently is?

The change was made in
813ba12 by requirements listed in #16

@Robbepop
Copy link
Contributor Author

Robbepop commented Jun 4, 2020

Moved to bytecodealliance/wasm-tools#22. Closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants