-
Notifications
You must be signed in to change notification settings - Fork 89
feat: add prompt only image generation datasets #310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d4b6d7b
5e282ce
1b8f003
91c791f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| # Copyright 2025 - Pruna AI GmbH. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from typing import Tuple | ||
|
|
||
| from datasets import Dataset, load_dataset | ||
|
|
||
| from pruna.logging.logger import pruna_logger | ||
|
|
||
|
|
||
| def setup_drawbench_dataset(seed: int) -> Tuple[Dataset, Dataset, Dataset]: | ||
| """ | ||
| Setup the DrawBench dataset. | ||
|
|
||
| License: Apache 2.0 | ||
|
|
||
| Parameters | ||
| ---------- | ||
| seed : int | ||
| The seed to use. | ||
|
|
||
| Returns | ||
| ------- | ||
| Tuple[Dataset, Dataset, Dataset] | ||
| The DrawBench dataset. | ||
| """ | ||
| ds = load_dataset("sayakpaul/drawbench", trust_remote_code=True)["train"] | ||
| ds = ds.rename_column("Prompts", "text") | ||
| pruna_logger.info("DrawBench is a test-only dataset. Do not use it for training or validation.") | ||
| return ds.select([0]), ds.select([0]), ds | ||
|
|
||
|
|
||
| def setup_parti_prompts_dataset(seed: int) -> Tuple[Dataset, Dataset, Dataset]: | ||
| """ | ||
| Setup the Parti Prompts dataset. | ||
|
|
||
| License: Apache 2.0 | ||
|
|
||
| Parameters | ||
| ---------- | ||
| seed : int | ||
| The seed to use. | ||
|
|
||
| Returns | ||
| ------- | ||
| Tuple[Dataset, Dataset, Dataset] | ||
| The Parti Prompts dataset. | ||
| """ | ||
| ds = load_dataset("nateraw/parti-prompts")["train"] | ||
| ds = ds.rename_column("Prompt", "text") | ||
| pruna_logger.info("PartiPrompts is a test-only dataset. Do not use it for training or validation.") | ||
| return ds.select([0]), ds.select([0]), ds | ||
|
|
||
|
|
||
| def setup_genai_bench_dataset(seed: int) -> Tuple[Dataset, Dataset, Dataset]: | ||
| """ | ||
| Setup the GenAI Bench dataset. | ||
|
|
||
| License: Apache 2.0 | ||
|
|
||
| Parameters | ||
| ---------- | ||
| seed : int | ||
| The seed to use. | ||
|
|
||
| Returns | ||
| ------- | ||
| Tuple[Dataset, Dataset, Dataset] | ||
| The GenAI Bench dataset. | ||
| """ | ||
| ds = load_dataset("BaiqiL/GenAI-Bench")["train"] | ||
| ds = ds.rename_column("Prompt", "text") | ||
| pruna_logger.info("GenAI-Bench is a test-only dataset. Do not use it for training or validation.") | ||
| return ds.select([0]), ds.select([0]), ds | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After our discussion I see why we cannot pass an empty dataset for the train and validation. Do you think it would make sense to print an info / warning in the setup functions to let people know they should be using the test dataloader?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that sounds like a good idea. Will log an info message when loading the dataset. |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slay