Skip to content

Support multiple video backends (OpenCV sometimes drops frames resulting in correct timecodes) #213

@elxy

Description

@elxy

Description of Problem & Solution
I want to use the FrameTimecode to instruct ffmpeg process. But the FrameTimecode is different with ffmpeg.
For belowing media, the first 2 scenes detected of command scenedetect -i Blossoms_at_the_Basin.mp4 detect-content list-scenes -n save-images is:

-----------------------------------------------------------------------
 | Scene # | Start Frame |  Start Time  |  End Frame  |   End Time   |
-----------------------------------------------------------------------
 |      1  |           0 | 00:00:00.000 |         462 | 00:00:19.269 |
 |      2  |         462 | 00:00:19.269 |         635 | 00:00:26.485 |

But the actual end frame number of scene 1 is 508 (start from 0), not 462. Look this:

scenedetect.jpg

I think the reason is that VideoCapture has dropped frames. I suggest to use PyAV to read frame. Because PyAV can decode frame with index and pts props.

Media Examples:

Blossoms_at_the_Basin.mp4 is the 4K format of https://www.youtube.com/watch?v=WzD_PREISiM

Proposed Implementation:

Here is a demo to read frames with PyAV:

import sys

import av
import cv2
import numpy

from scenedetect.video_manager import compute_downscale_factor


class Video():
    def __init__(self, video):
        self.video = video
        self.container = av.open(video)

        self.stream = self.container.streams.video[0]
        self.width = self.stream.codec_context.width

        def _get_frame_rate(stream: av.video.stream.VideoStream):
            if stream.average_rate.denominator and stream.average_rate.numerator:
                return float(stream.average_rate)
            if stream.time_base.denominator and stream.time_base.numerator:
                return 1.0 / float(stream.time_base)
            else:
                raise ValueError("Unable to determine FPS")

        self.frame_rate = _get_frame_rate(self.stream)

    def frames(self):
        for frame in self.container.decode(video=0):
            yield frame.index, frame.to_ndarray(format='bgra')



def compute_delta_hsv(i1, i2):
    i1_hsv = cv2.split(cv2.cvtColor(i1, cv2.COLOR_BGR2HSV))
    i2_hsv = cv2.split(cv2.cvtColor(i2, cv2.COLOR_BGR2HSV))
    delta_hsv = [0, 0, 0, 0]
    for i in range(3):
        num_pixels = i1_hsv[i].shape[0] * i1_hsv[i].shape[1]
        i1_hsv[i] = i1_hsv[i].astype(numpy.int32)
        i2_hsv[i] = i2_hsv[i].astype(numpy.int32)
        delta_hsv[i] = numpy.sum(numpy.abs(i1_hsv[i] - i2_hsv[i])) / float(num_pixels)
    return sum(delta_hsv[0:3]) / 3.0


video = Video(sys.argv[1])
threshold = 30.0
factor = compute_downscale_factor(video.width)

last_frame = None
for index, frame in video.frames():
    frame = frame[::factor, ::factor, :3]
    if last_frame is None:
        last_frame = frame
        continue
    hsv = compute_delta_hsv(last_frame, frame)
    if hsv >= threshold:
        print(index)
    last_frame = frame

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions