This SDK generates datasets for training Video LLMs from youtube videos. More sources coming later!
- Generate search queries with GPT.
- Search for youtube videos for each query using scrapetube.
- Download the videos that were found and subtitles using yt-dlp.
- Detect segments from each video using CLIP and a fancy manual algorithm.
- Generate annotations for each segment with GPT using audio transcript (eg instructions) in 2 steps: first extract clues from the trancript, then generate annotations based on these clues.
- Aggregate segments with annotations into one file
- Cut segments into separate video clips with ffmpeg.
In the end you'll have a directory with useful video clips and an annotation file, which you can then train a model on.
pip install -r requirements.txt. If it doesn't work, try updatingpip install -U -r requirements.txt.- make
.envfile with:OPENAI_API_KEYfor openaiAZURE_OPENAI_ENDPOINTandAZURE_OPENAI_API_KEYfor azureOPENAI_API_VERSION='2023-07-01-preview'
- set config params in the notebook:
openai.type: openai/azureopenai.temperature: the bigger, the more random/creative output will beopenai.deployment: model for openai / deployment for azure. Needs to be able to do structured output and process images. Tested on gpt4o on azure.data_dir: the path where all the results will be saved. Change it for each experiment/dataset.
Please refer to getting_started.ipynb
If you have your own videos with descriptions, you can skip the download/filtering steps and move straight to generating annotaions!