chore: improve channel messaging and address memory leaks by derklaro · Pull Request #1687 · CloudNetService/CloudNet

derklaro · 2025-07-14T20:39:01Z

Motivation

The channel messaging system (as well as a few network parts in general) really need a cleanup to improve the code quality and reliability of the system. Furthermore, there were memory leaks reported that need to be addressed as well.

Modification

This pr mostly improves channel messaging to be more reliable overall and improves the api surface exposed to consumers. Furthermore, all possible memory leaks were addressed (not limited to channel messaging) and some other networking parts were improved as well.

Result

Channel messaging should be more reliable and easier to use. Additionally, there shouldn't be any memory leaks anymore.

…ta-buf-enhance

…g sync methods

derklaro · 2025-07-14T20:39:59Z

/cc @DasBabyPixel memory leaks should be fixed now

github-actions · 2025-07-14T20:43:12Z

Test Results

56 files ±0 56 suites ±0 3m 19s ⏱️ +26s
570 tests ±0 570 ✅ ±0 0 💤 ±0 0 ❌ ±0
935 runs ±0 935 ✅ ±0 0 💤 ±0 0 ❌ ±0

Results for commit c4bc4b2. ± Comparison against base commit c108c3e.

This pull request removes 32 and adds 32 tests. Note that renamed tests count towards both.

eu.cloudnetservice.driver.impl.document.DocumentSerialisationTest ‑ [4] {"b":1,"s":2,"i":3,"l":4,"f":5.0,"d":6.0,"c":"/","string":"Hello, World!","bol":true,"cloud":["Ben?","Yes","No","HoHoHoHo"],"world":{"insane":"!","hello":"world","this":"is"}}, PRETTY
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [14] 2025-07-05
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [23] 22:21:16.493105216
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [28] 22:21:16.493221032Z
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [29] 22:21:16.493242613Z
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [30] 22:21:16.493264484+05:00
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [31] 22:21:16.493282818-03:00
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [34] 2025-07-05T22:21:16.493410477
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [39] 2025-07-05T22:21:16.493542874Z
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [4] 2025-07-05T22:21:16.486859409Z
…

eu.cloudnetservice.driver.impl.document.DocumentSerialisationTest ‑ [4] {"b":1,"s":2,"i":3,"l":4,"f":5.0,"d":6.0,"c":"/","string":"Hello, World!","bol":true,"cloud":["Ben?","Yes","No","HoHoHoHo"],"world":{"hello":"world","insane":"!","this":"is"}}, PRETTY
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [14] 2025-07-16
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [23] 16:47:46.067615018
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [28] 16:47:46.067726286Z
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [29] 16:47:46.067746874Z
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [30] 16:47:46.067768364+05:00
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [31] 16:47:46.067787359-03:00
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [34] 2025-07-16T16:47:46.067913374
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [39] 2025-07-16T16:47:46.068089152Z
eu.cloudnetservice.driver.impl.document.gson.JavaTimeSerializerTest ‑ [4] 2025-07-16T16:47:46.058082152Z
…

♻️ This comment has been updated with latest results.

DasBabyPixel

I think this is a step in the right direction. If you give me a few days time I have a reason to update past RC11 other than java 24 and time to test this branch.
I think replacing null checks with pattern matching is either an overly aggressive inspection or a too idealistic world view. It does not make the code more readable.
I haven't yet gone through all usages to look for all possible leaks, but the obvious ones I do remember should be fixed here.

...rc/main/java/eu/cloudnetservice/driver/impl/network/chunk/network/ChunkedPacketListener.java

DasBabyPixel · 2025-07-15T01:55:59Z

driver/impl/src/main/java/eu/cloudnetservice/driver/impl/network/netty/NettyNetworkHandler.java

+  private void handlePacket(@NonNull BasePacket packet) {
+    try {
+      this.doHandlePacket(packet);
+    } catch (Exception exception) {


I'd go further and catch Throwable instead of silently ignoring OOMEs or similar

we should get oracle to have the default UncaughtExceptionHandler call System.exit.
I remember wasting way too many hours because network threads ate my errors and I couldn't for the life of me figure out where.

DasBabyPixel · 2025-07-15T02:05:50Z

...src/main/java/eu/cloudnetservice/driver/impl/network/netty/buffer/NettyImmutableDataBuf.java

+    try {
      this.buffer.close();
+    } catch (IllegalStateException _) {
+      // possible double-free error due to a race, ignore


since when are buffers thread safe? This seems like a hack that will cause issues later

The buffer is not thread-safe, but the drop guarding the buffer is. So concurrent close calls are safe. However, I would also argue that this is the better approach as close calls should not throw an exception if the buffer was already closed before (this was partly implemented using the isAccessible() check before - just not thread-safe).

My understanding of how the buffers work may be incorrect, but concurrent close calls can never be safe?

EDIT: part of internal buffer code in ResourceSupport class, XXXBuffer extends ResourceSupport

@Override public final void close() { if (acquires == -1) { throw attachTrace(new IllegalStateException("Double-free: Resource already closed and dropped.")); } int acq = acquires; acquires--; if (acq != 0) { // Only record a CLOSE if we're not going to record a DROP. tracer.close(acq); } else { // The 'acquires' was 0, now decremented to -1, which means we need to drop. tracer.drop(0); try { drop.drop(impl()); Reference.reachabilityFence(this); // Avoid racing with any Cleaner. } finally { makeInaccessible(); } } }

acquires is a non-volatile int. So a buffer must, at every point in time, have a clear "owner" thread. Having IllegalStateExceptions thrown seems to me like the presence of another issue somewhere else, maybe in how the buffer is used or passed between owners.

If we want to enable concurrent close calls, we need to have some sort of atomic isClosed flag or counter

Yes, that place is not atomic, but the actual dropping happens in ArcDrop (in our case) which is.

I guess you are right, ArcDrop is atomic and concurrent close calls should be safe.
Still think this is a bad idea long-term and while it may work now may fail if the netty implementation details change. Someone once told me relying on implementation details is a hassle and I should avoid it, so I do.

Imo a buffer lifecycle should be clearly defined and a close call must be on the owner thread, or synchronized in some way.
Also I haven't checked, but does enabling leak detection also enable it for the wrapper?

I don't really understand the problem. As stated before the exception is caught to ensure that, at any point, a call to close does NOT throw an exception. The previous isAccessible() guard did the same thing but left the door open a tiny bit open for an exception to be thrown anyway - this is now not possible anymore. Also note that the documentation of DataBuf still states that "buffers are not required to be thread safe", and there is no documentation hinting that close calls are thread-safe. It's just a security harness to be 100% sure that no exception is thrown, ever.

If we argue that buffers are not thread safe, then concurrent close calls should be able to fail and throw the exception.
It's like instead of getting a ConcurrentModificationException when concurrently modifying an array list, it just silently fails.
The presence of such an exceptions indicates a deeper problem, in that case the concurrent usage of close. The absence of such an exception makes is so that problems, like incorrect usage of buffers, and concurrent close calls, are more difficult to diagnose.
It's not directly a problem where something doesn't work, correct usage of buffers will work. It's that catching that exception will make the future more difficult with bugs that can't be diagnosed because the exception is caught and ignored.

My point is: I want incorrect usage, in this case concurrent usage, to throw an exception. It makes it much easier to debug problems, and often prevents them alltogether.

...rc/main/java/eu/cloudnetservice/modules/bridge/impl/platform/helper/ProxyPlatformHelper.java

...s/impl/src/main/java/eu/cloudnetservice/modules/npc/impl/platform/PlatformNPCManagement.java

DasBabyPixel · 2025-07-16T23:01:24Z

EDIT: nvm I think I merged something incorrectly into my fork

DasBabyPixel · 2025-07-17T00:35:35Z

Disregarding opinions on coding guidelines etc, it's overall looking good. No immediate masses of memory leaks are being thrown at me.
Just don't forget to include the information about closing ChannelMessages in the changelogs.

derklaro and others added 15 commits July 4, 2025 20:48

begin some data buf enhancements

bbca10c

Merge branch 'nightly' of github.com:CloudNetService/CloudNet into da…

13af62a

…ta-buf-enhance

begin some channel message cleanup

9399041

more docs, add service target by property key

31bda42

make everything work again

942dfd7

add missing non-null annotation

477419b

more work

91babf2

improve readability

5843211

guard against possible addAll(null) call

56deed4

improve sendSingleChannelMessageQuery(Async)

c3da179

prevent circular strong reference by packet buffers

8a16b2d

fix double-acquire when transmitting chunked session info

9d0022b

improve leak detection enable condition handling

914260e

fix channel message query response leak in case of timeouts when usin…

d329abc

…g sync methods

hack around a limitation with orTimeout

1215a1a

derklaro added this to the 4.0.0-RC14 milestone Jul 14, 2025

derklaro requested a review from 0utplay July 14, 2025 20:39

derklaro self-assigned this Jul 14, 2025

derklaro added v: 4.X This pull should be included in the 4.0 release t: improvement The pull request improves existing code labels Jul 14, 2025

DasBabyPixel reviewed Jul 15, 2025

View reviewed changes

derklaro added 4 commits July 15, 2025 22:55

improve docs

08add57

feedback

bee27a0

missed the other one

1ab8621

document ISE when calling methods on a closed buffer

c4bc4b2

0utplay approved these changes Jul 16, 2025

View reviewed changes

derklaro merged commit a7046d1 into nightly Jul 17, 2025
5 checks passed

derklaro deleted the data-buf-enhance branch July 17, 2025 08:27

derklaro mentioned this pull request Jul 17, 2025

Fix networking memory leaks and file transfer issues #1578

Closed

derklaro linked an issue Jul 17, 2025 that may be closed by this pull request

Memory leaks because the packet content buffer is not released in handlers #1577

Closed

1 task

derklaro mentioned this pull request Jul 17, 2025

Memory leaks because the packet content buffer is not released in handlers #1577

Closed

1 task

Conversation

derklaro commented Jul 14, 2025

Motivation

Modification

Result

Uh oh!

derklaro commented Jul 14, 2025

Uh oh!

github-actions bot commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

DasBabyPixel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DasBabyPixel Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

DasBabyPixel Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

DasBabyPixel Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

derklaro Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

DasBabyPixel Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DasBabyPixel Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

derklaro Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

DasBabyPixel Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

derklaro Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DasBabyPixel Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DasBabyPixel commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DasBabyPixel commented Jul 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Jul 14, 2025 •

edited

Loading

DasBabyPixel Jul 15, 2025 •

edited

Loading

derklaro Jul 16, 2025 •

edited

Loading

DasBabyPixel Jul 16, 2025 •

edited

Loading

DasBabyPixel commented Jul 16, 2025 •

edited

Loading