Remove partitioning from CancellationTokenSource#48251
Remove partitioning from CancellationTokenSource#48251stephentoub merged 1 commit intodotnet:masterfrom
Conversation
When CancellationTokenSource was original created, the expectation was that a majority use case would be lots of threads in parallel registering and unregistering handlers. This led to a design where CTS internally partitions its registrations to minimize contention between threads contending on its internal data structures. While that certainly comes up in practice, a much more common case is just one thread registering and unregistering at a time as a CancellationToken unique to a particular operation (e.g. a linked token source) is passed down through it, with various levels of the chain registering and unregistering from that non-concurrently-used token source. And having such partitioning results in non-trivial allocation overheads, in particular for a short-lived CTS with which only one or a few registrations are employed in its lifetime. This change removes that partitioning scheme; all scenarios end up with less memory allocation, and non-concurrent scenarios end up measurably faster... scenarios where there is contention do take a measurable hit, but given that's the rare case, it's believed to be the right trade-off (when in doubt, it's also the simpler implementation). As long as I was refactoring a bunch of code, I fixed up a few other things along the way: - Avoided allocating while holding the instance's spin lock - Made WaitForCallbackAsync into a polling async method rather than an async-over-sync method - Changed the state values to be 0-based to avoid needing to initialize _state to something other than 0 in the common case - Used existing throw helpers in a few more cases - Renamed a few methods, and made a few others to be local functions
70cfdc5 to
b81681f
Compare
|
cc: @kouvel, @adamsitnik, @carlossanlop, @jozkee |
| private const int NotifyingCompleteState = 3; | ||
| private const int NotCanceledState = 0; // default value of _state | ||
| private const int NotifyingState = 1; | ||
| private const int NotifyingCompleteState = 2; |
There was a problem hiding this comment.
Is it possible this will break any debugger logic? Even if not, I could see this kind of change causing some confusion if someone manually debugging looked up the wrong version of the CTS source code.
There was a problem hiding this comment.
I'm not aware of any debugging code paying attention to CancellationTokenSource's private _state field and interpreting its value. In the rare case where we're aware of that (e.g. a few places in Task), we like to annotate it with a comment, and there's no such comment here.
| /// Separated out into a separate instance to keep CancellationTokenSource smaller for the case where one is created but nothing is registered with it. | ||
| /// This happens not infrequently, in particular when one is created for an operation that ends up completing synchronously / quickly. | ||
| /// </remarks> | ||
| internal sealed class Registrations |
There was a problem hiding this comment.
Is there a perf benefit or something to sealing an internal class with no virtual methods? I see CallbackPartition was also sealed. I'm just curious.
There was a problem hiding this comment.
If there aren't any virtuals or interface implementations, not much. I'm just in the habit of sealing everything until reason dictates otherwise :-)
There was a problem hiding this comment.
There was a problem hiding this comment.
isinst is a bit faster for sealed classes
Yup, for sealed types an is can then be achieved similar to GetType() == typeof(Target).
|
For reference, in YARP, this change alone saves 4 allocations (456 bytes) per request |
When CancellationTokenSource was originally created, the expectation was that a majority use case would be lots of threads in parallel registering and unregistering handlers. This led to a design where CTS internally partitions its registrations to minimize contention between threads contending on its internal data structures. While that certainly comes up in practice, a much more common case is just one thread registering and unregistering at a time as a CancellationToken unique to a particular operation (e.g. a linked token source) is passed down through it, with various levels of the chain registering and unregistering from that non-concurrently-used token source. And having such partitioning results in non-trivial allocation overheads, in particular for a short-lived CTS with which only one or a few registrations are employed in its lifetime. This change removes that partitioning scheme; all scenarios end up with less memory allocation, and non-concurrent scenarios end up measurably faster... scenarios where there is contention do take a measurable hit, but given that's the rare case, it's believed to be the right trade-off (when in doubt, it's also the simpler implementation).
As long as I was refactoring a bunch of code, I fixed up a few other things along the way: