Skip to content

feat: vbench datamodule#397

Merged
begumcig merged 3 commits intomainfrom
feat/datamodule-for-video-benchmarking
Oct 23, 2025
Merged

feat: vbench datamodule#397
begumcig merged 3 commits intomainfrom
feat/datamodule-for-video-benchmarking

Conversation

@begumcig
Copy link
Copy Markdown
Member

@begumcig begumcig commented Oct 6, 2025

Description

This PR introduces the prompt suite of VBench as a pruna datamodule.

The updates in this PR:

  • Loading the VBench prompts from the JSON and turn them into datasets
  • Options to load single / multi / all dimensions of the prompt suite
  • Update the prompt collate to have auxiliary information that could be related to benchmarking
  • Small bugfix in our utilities that changed the type of the dataset

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Added a test in our datamodule tests

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Additional Notes

You can use the new datamodule as follows:

# Load specific dimension 

dm = PrunaDataModule.from_string("VBench" , category = 'background_consistency')

# Load the entire prompt suite

dm = PrunaDataModule.from_string("VBench")

Note

Adds VBench prompt suite as a dataset, introduces prompt_with_auxiliaries_collate, enables category filtering via PrunaDataModule.from_string, and integrates tests.

  • Datasets:
    • Add text_to_video.setup_vbench_dataset to load VBench prompts from VBench_full_info.json, rename columns to category and text, optional category filtering, and return test-focused splits.
    • Register "VBench" in base_datasets using prompt_with_auxiliaries_collate.
  • Collate:
    • Introduce prompt_with_auxiliaries_collate returning List[str] prompts and auxiliary metadata dicts; expose via pruna_collate_fns.
  • DataModule:
    • Extend from_string with category arg; auto-forward to setup functions that accept it.
  • Tests:
    • Add VBench to test_dm_from_string matrix.

Written by Cursor Bugbot for commit 3d21412. This will update automatically on new commits. Configure here.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@begumcig begumcig force-pushed the feat/datamodule-for-video-benchmarking branch from 40584f7 to c3715ac Compare October 14, 2025 12:58
cursor[bot]

This comment was marked as outdated.

@begumcig begumcig force-pushed the feat/datamodule-for-video-benchmarking branch from c3715ac to 9e9ed1a Compare October 14, 2025 13:15
cursor[bot]

This comment was marked as outdated.

@begumcig begumcig force-pushed the feat/datamodule-for-video-benchmarking branch 3 times, most recently from 23a383c to a94c2ba Compare October 14, 2025 16:10
Copy link
Copy Markdown
Member

@simlang simlang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! 🚀 next to typo and a docstring thing, just the collate_fn thing. completely up to you if you agree or not. easy approve

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to docstring

Copy link
Copy Markdown
Member

@davidberenstein1957 davidberenstein1957 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments, feel free to merge after.

@@ -135,6 +135,7 @@ def from_string(
collate_fn_args: dict = dict(),
dataloader_args: dict = dict(),
seed: int = 42,
category: str | list[str] | None = None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we mention a list of categories also somehwere in the doctstring

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a very simple definition of category here because there's a chance we will use this category attribute also for other datasets in the feature so we are not limited to VBench, but will be adding the categories of VBench in the documentation update how does that sound?

@@ -25,10 +25,16 @@ invalid-assignment = "ignore" # mypy is more permissive with Any assignments
call-non-callable = "ignore" # mypy allows more dynamic method calls
index-out-of-bounds = "ignore" # mypy is more permissive with tuple indexing
unresolved-attribute = "ignore" # mypy is more permissive with module attributes
possibly-unbound-attribute = "ignore" # mypy doesn't warn about this as much
possibly-unbound-variable = "ignore" # mypy doesn't warn about this as much
redundant-cast = "ignore" # mypy doesn't warn about redundant casts
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you had also applied these changes in another PR. should we add some comment in either PR about whhy and what happened?

Copy link
Copy Markdown
Member Author

@begumcig begumcig Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ofcourse! Basically, we switched to ty from mypy for static type checking and it's more restrictive than mypy. Maybe, sometime in the future we will enforce all of these rules, but for now, we are restricting them to be similar to what we had with mypy! We have this as a comment in line 22 in this file but it doesn't show since it was not added by me 🥺

@begumcig begumcig force-pushed the feat/datamodule-for-video-benchmarking branch from 00b2ea7 to 4d66c4e Compare October 23, 2025 08:49
@cursor
Copy link
Copy Markdown

cursor bot commented Oct 23, 2025

Bug: Incorrect Iteration in List Comprehension

The list comprehension for prompt_list in prompt_with_auxiliaries_collate uses an incorrect iteration pattern. Instead of directly extracting the "text" field from each row, it iterates over all key-value pairs within each row, which is an overly complex and indirect way to get one text value per data item.

Fix in Cursor Fix in Web

@begumcig begumcig force-pushed the feat/datamodule-for-video-benchmarking branch from 4d66c4e to ec46aa6 Compare October 23, 2025 09:04
@begumcig begumcig force-pushed the feat/datamodule-for-video-benchmarking branch from ec46aa6 to 3d21412 Compare October 23, 2025 09:15
@begumcig begumcig merged commit e49b837 into main Oct 23, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants