Skip to content

2.x: Possible deadlock when using observeOn(Scheduler, boolean) #6146

@jkarshin

Description

@jkarshin

RxJava version: 2.1.11
Java: 1.8.0_181

I'm encountering an intermittent deadlock in a rather long Flowable, and I believe I've pinpointed it to an observeOn(...) call. (I've reached this conclusion through a series of log statements.) I haven't been able to trace through the test when the deadlock occurs, as it only occurs about once every 30 - 40 executions, and each execution takes about a minute. I've managed to reproduce the deadlock about a dozen times (each time, I've been adding more logging to figure out where things are getting stuck).

Flowable<SomeType> flow = // Lots of stuff upstream

flow.doOnNext(x -> /* log 1 */)
    .doOnComplete(() -> /* log 2 */)
    .observeOn(Schedulers.io(), true)
    .doOnNext(x -> /* log 3 */)
    .doOnComplete(() -> /* log 4 */)
    // Lots more downstream

In the test case where I experience the occasional deadlock, I expect only 1 item to be emitted through this part of the Flowable. I can see log statements 1 and 2 indicating that the item reaches the observeOn(...) and that the upstream is finished, but logs 3 and 4 are never reached. (I forgot to add a doOnError(...) to make sure an exception isn't sneaking through and holding things up else where, but I'm fairly confident there aren't any uncaught exceptions. I've added a doOnError(...) and am re-running my test now to make sure; I'll update my post once I have results.)

Because the logs are hit in this way, this leads me to believe the observeOn(...) is locking up somehow. What's really strange is that everything works fine most of the time.

None of the downstream operators should be attempting to dispose the Flowable early either. I believe my terminal operator is a blockingGet() on a Single, no timeout or anything.

I also logged the total number of threads in my JVM to see if I'm leaking threads somewhere, but I'm only at 49 when the deadlock occurs. (I'm using the IO scheduler, which I believe is backed by an unbounded pool, so I can't imagine I would be running out of worker threads.) I do have other Flowables doing unrelated tasks in the background. All of those Flowables are using the IO scheduler. Additionally, the up and down stream of the Flowable in my test also make use of the IO scheduler, but the deadlock always seems to happen here.

I realize that's not a lot of info to go off of, but I figured I'd ask the experts in case there's something glaring that I'm missing, or if there's something else I can do to figure out what's going on.

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    2.xMissing-DetailsCould be a question or a bug report, but not enough details are provided.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions