Fix StreamReader EOF handling and improve perf#69888
Fix StreamReader EOF handling and improve perf#69888stephentoub merged 3 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsThis fixes a bug in For a concrete example, consider the below sample. using System.IO;
using System.Text;
var reader = new StreamReader(
new MemoryStream(new byte[] { 0xF0 }),
new UTF8Encoding(
encoderShouldEmitUTF8Identifier: false,
throwOnInvalidBytes: true));
Console.WriteLine(reader.ReadToEnd());This prints an empty string to the console, even though it should throw I also took this opportunity to tighten the This results in an approx. 50% throughput increase in ReadLine and ReadLineAsync, as shown in the below benchmarks. The benchmarks also show a decrease in overall
// In the below benchmarks, _ms is a MemoryStream whose contents have been initialized from:
// https://www.gutenberg.org/files/11/11-0.txt
[Benchmark]
public int GetLineCount()
{
_ms.Position = 0;
StreamReader reader = new StreamReader(_ms);
int lineCount = 0;
while (reader.ReadLine() != null) { lineCount++; }
return lineCount;
}
[Benchmark]
public int GetLineCountAsync()
{
_ms.Position = 0;
StreamReader reader = new StreamReader(_ms);
int lineCount = 0;
while (reader.ReadLineAsync().GetAwaiter().GetResult() != null) { lineCount++; }
return lineCount;
}/cc @dotnet/area-system-io
|
src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs
Outdated
Show resolved
Hide resolved
|
@GrabYourPitchforks do you expect to have time to continue this PR? I see that #62552 by @Trapov. Is much work remaining? @stephentoub how important is it that we get these PR's into .NET 7? |
|
It'd be valuable to get this PR for .NET 7, but we wouldn't block the release on it 😄 |
jozkee
left a comment
There was a problem hiding this comment.
LGTM, I will close-reopen it to retrigger CI.
|
CI issues are related: |
|
@GrabYourPitchforks, I'm planning to take over this PR. Let me know if you'd prefer I not. Thanks. |
3c90a07 to
4a1b30c
Compare
4a1b30c to
5638093
Compare
adamsitnik
left a comment
There was a problem hiding this comment.
LGTM, thank you both @GrabYourPitchforks and @stephentoub !
@stephentoub is there any chance you could run these benchmarks and share the results before we hit the merge button?
| Buffer.BlockCopy(_byteBuffer, n, _byteBuffer, 0, _byteLen - n); | ||
| byte[] byteBuffer = _byteBuffer; | ||
| _ = byteBuffer.Length; // allow JIT to prove object is not null | ||
| new ReadOnlySpan<byte>(byteBuffer, n, _byteLen - n).CopyTo(byteBuffer); |
There was a problem hiding this comment.
not related to the changes you have made: I can see that once the encoding is detected, this method is being called:
runtime/src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs
Lines 491 to 493 in ff4b245
So for the default buffer size and Unicode:
we copy 1022 bytes in place.
Why don't we just update _byteLen and _bytePos? I know it would require changing some other parts of the code that rely on the assumption that _bytePos == 0 after the read:
I am asking this question because this PR improves the performance of DetectEncoding by moving rarely called code out of hot path and it seems like another perf improvement we could make here while we are at it.
There was a problem hiding this comment.
I did. It was preexisting this PR and my goal was just to land this PR. Anyone is welcome to follow up on the comment.
|
|
Windows arm64 improvements: dotnet/perf-autofiling-issues#9689 |
This fixes a bug in
StreamReaderwhere it doesn't properly pass flush: true to the backingDecoderinstance when EOF is reached. This could result in cases where partial data in the decoder buffer is never processed.For a concrete example, consider the below sample.
This prints an empty string to the console, even though it should throw
DecoderFallbackExceptionsince the caller explicitly requested that they want to reject invalid data.I also took this opportunity to tighten the
ReadLineandReadLineAsyncmethods, including usingStringBuilderCache. This shouldn't blow the cache because we expect lines to be 80 - 100 chars at the high end, which is well within whatStringBuilderCachecan handle.This results in an approx. 50% throughput increase in ReadLine and ReadLineAsync, as shown in the below benchmarks. The benchmarks also show a decrease in overall
StringBuilderallocations./cc @dotnet/area-system-io