Add option to enforce minimum segment length by StephanDollberg · Pull Request #150 · apache/otava

StephanDollberg · 2026-04-23T17:29:56Z

One problem we often see in practice is that single spikes cause
changepoints. These are just false positive noise which we want to
avoid.

This patch adds a config to disallow changepoints that only enclose
segments of a certain length.

Like that we can filter out these one-event changepoints and avoid
noise.

Of course this will mute true changepoints in a short segment but that's
fine if they are followed by another true changepoint. For example [100,
100, 130, 130, 150, 150, 150 ...] would only report the 150 one. This is
fine. A single alert is good enough to get someone to look at the data.

Default behaviour is unchanged.

Let me know what you think.

One problem we often see in practice is that single spikes cause changepoints. These are just false positive noise which we want to avoid. This patch adds a config to disallow changepoints that only enclose segments of a certain length. Like that we can filter out these one-event changepoints and avoid noise. Of course this will mute true changepoints in a short segment but that's fine if they are followed by another true changepoint. For example [100, 100, 130, 130, 150, 150, 150 ...] would only report the 150 one. This is fine. A single alert is good enough to get someone to look at the data. Default behaviour is unchanged.

henrikingo · 2026-04-23T19:03:38Z

Hi Stephan

The original e-divisive implementation (in R) actually included such an option, and I think MongoDB's signal_processing_algorithms likewise required 2 points at each end of the segment, meaning that it was only possible to find a change point in segments that had at least 5 points, and in the case it would have to be point #3 that is the change point.

The Hunter implementation then modified this to any point in arbitrary short segments to be a change point. "In practice the minimum segment is 3, since with only 2 points it is not possible to establish the "normal" range that the other point would be a change from." The Hunter modifications specifically were introduced to correctly find two nearby change points, since a common use case (in Cassandra development, anyway) was that they would observe a regression and then immediately fix it in a nearby commit. The datastax team observed that they could make the algorithm more sensitive by dividing a long series into smaller windows. A byproduct of this is that even individual outlier points sometimes get marked as change points much more easily than in the original e-divisive.

The relevant parameter is window_len and by defaylt it is set to 50. Before adding a new parameter, it would be interesting to hear from you whether you get "better" behavior by increasing this parameter. In principle you should be able to get original e-divisive behavior by setting window_len to a really large value = larger than the length of your time series.

If you do this and still observe individual outliers getting marked as change points (or even two changepoints) could you please share a data sample

StephanDollberg · 2026-04-24T09:34:15Z

Thanks for the background, interesting to know!

Sure, I can give that a try. I had tried without specifying windowlen in the past and wasn't entirely happy (I know not a very qualitive statement haha) but I also didn't know 50 was the default until yesterday and assumed it's actually a lot bigger. Let me try again with that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to enforce minimum segment length#150

Add option to enforce minimum segment length#150
StephanDollberg wants to merge 1 commit intoapache:masterfrom
StephanDollberg:stephan/min-segment-len

StephanDollberg commented Apr 23, 2026

Uh oh!

henrikingo commented Apr 23, 2026

Uh oh!

StephanDollberg commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

StephanDollberg commented Apr 23, 2026

Uh oh!

henrikingo commented Apr 23, 2026

Uh oh!

StephanDollberg commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants