Add POST /datasets/untag endpoint by ishansharma08 · Pull Request #251 · openml/server-api

ishansharma08 · 2026-02-21T18:17:20Z

Implements the POST /datasets/untag endpoint as part of the legacy API migration.

This endpoint removes a tag from a dataset and returns the dataset ID on success. It returns 422 Unprocessable Entity when the tag does not exist for the dataset (replacing the legacy 412 behavior), and 401 when unauthenticated.

The implementation follows the structure of the existing /datasets/tag endpoint for consistency.

Tested locally using Docker Compose with full environment:

Ran docker compose --profile python up --build

Verified database and API container startup

Tested endpoint via Swagger UI at http://localhost:8001/docs

Confirmed correct behavior for both success and failure cases

Please let me know if any adjustments are needed.

coderabbitai · 2026-02-21T18:17:39Z

Walkthrough

Adds a new untag_dataset endpoint at path "/untag" that requires authentication, fetches existing tags for a dataset, validates the specified tag exists (case-insensitive), calls database.datasets.untag to remove the tag, and returns the dataset id. Introduces two module-level helpers, tag(data_id, tag, user_id, connection) and untag(data_id, tag, user_id, connection). The existing tag_dataset endpoint remains but now contains an additional return block. Net changes: +60/-0 lines.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Add POST /datasets/untag endpoint' accurately and concisely describes the main change - adding a new untag endpoint to the datasets router.
Description check	✅ Passed	The description clearly explains the implementation of the /datasets/untag endpoint, its behavior, consistency with existing code, and testing methodology.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

The untag_dataset endpoint is defined twice with identical implementations; remove the duplicate definition to avoid FastAPI route conflicts and maintenance issues.
The new tag/untag database functions and from sqlalchemy import text import appear to be added at the bottom of the router module rather than in the database layer (e.g., database.datasets); consider moving them to the appropriate data-access module to keep concerns separated.
In the new untag DB function, the user_id parameter is accepted but unused in the SQL; either use it (e.g., for auditing/ownership) or remove it to avoid confusion.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `untag_dataset` endpoint is defined twice with identical implementations; remove the duplicate definition to avoid FastAPI route conflicts and maintenance issues.
- The new `tag`/`untag` database functions and `from sqlalchemy import text` import appear to be added at the bottom of the router module rather than in the database layer (e.g., `database.datasets`); consider moving them to the appropriate data-access module to keep concerns separated.
- In the new `untag` DB function, the `user_id` parameter is accepted but unused in the SQL; either use it (e.g., for auditing/ownership) or remove it to avoid confusion.

## Individual Comments

### Comment 1
<location> `src/routers/openml/datasets.py:509-518` </location>
<code_context>
+from sqlalchemy import text
+
+
+def tag(data_id: int, tag: str, user_id: int, connection: Connection) -> None:
+    connection.execute(
+        text(
+            """
+            INSERT INTO dataset_tag (id, tag)
+            VALUES (:data_id, :tag)
+            """
+        ),
+        {"data_id": data_id, "tag": tag},
+    )
+
+
+def untag(data_id: int, tag: str, user_id: int, connection: Connection) -> None:
+    connection.execute(
+        text(
</code_context>

<issue_to_address>
**issue:** `user_id` parameter is unused in the new `tag`/`untag` helpers

`user_id` is never used in these helpers. Either incorporate it for auditing/authorization at this layer, or remove it from the function signatures to avoid a misleading, unnecessary parameter.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-02-21T18:18:19Z

src/routers/openml/datasets.py

+def tag(data_id: int, tag: str, user_id: int, connection: Connection) -> None:
+    connection.execute(
+        text(
+            """
+            INSERT INTO dataset_tag (id, tag)
+            VALUES (:data_id, :tag)
+            """
+        ),
+        {"data_id": data_id, "tag": tag},
+    )


issue: user_id parameter is unused in the new tag/untag helpers

user_id is never used in these helpers. Either incorporate it for auditing/authorization at this layer, or remove it from the function signatures to avoid a misleading, unnecessary parameter.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routers/openml/datasets.py`:
- Around line 83-113: The file contains a duplicate FastAPI route definition for
POST /datasets/untag: two identical functions named untag_dataset decorated with
router.post(path="/untag"), which registers conflicting callables; remove the
duplicate definition (the second untag_dataset function and its decorator) so
only a single router.post(path="/untag") / untag_dataset implementation remains,
or if you intended different behavior, rename the function and route to be
unique; ensure database.datasets.untag and the dependencies (fetch_user,
expdb_connection) remain referenced by the single kept function.

src/routers/openml/datasets.py

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routers/openml/datasets.py`:
- Around line 49-51: Remove the unreachable dead-code return that returns
{"data_untag": {"id": str(data_id)}}; locate the function in
src/routers/openml/datasets.py that already returns earlier (the untag
response/handler) and delete the redundant return statement/block containing
"data_untag" so only the intended earlier return remains; ensure no other code
depends on that duplicated response shape.
- Around line 63-73: The code checks tag membership before verifying
authentication, leaking tag existence to unauthenticated callers; move the
authentication guard (the user is None check that raises
HTTPStatus.UNAUTHORIZED) to run before any DB/tag checks (i.e., before calling
get_tags_for and before using the tags list), so the route handler returns 401
for unauthenticated requests and only proceeds to the tag existence logic (the
tag.casefold() check against tags) after user is validated.
- Around line 483-504: Remove the dead module-level functions tag and untag from
this file: they are unused, have an unused user_id parameter, and the tag
implementation omits the required uploader column; delete the entire function
definitions named tag(data_id: int, tag: str, user_id: int, connection:
Connection) and untag(data_id: int, tag: str, user_id: int, connection:
Connection) so callers use the correct database.datasets.tag and
database.datasets.untag implementations.
- Around line 75-80: The call to database.datasets.untag(...) is failing because
the untag function is missing; add an untag(data_id, tag, user_id=None,
connection=None) function to src/database/datasets.py that mirrors the existing
tag(...) signature and behavior but removes the association instead of creating
it: locate the database.datasets module and implement untag to perform the
inverse DB operation used by tag (delete the tag row for the given data_id and
tag using the provided connection), handle user_id/context the same way as tag,
return the same success/failure shape as tag (or raise the same errors), and
ensure any cached/get_tags_for(...) behavior remains consistent after removal.

coderabbitai · 2026-02-21T18:52:51Z

src/routers/openml/datasets.py

+    return {
+        "data_untag": {"id": str(data_id)},
+    }


⚠️ Potential issue | 🟡 Minor

Remove the unreachable return block — dead code artifact.

Lines 49-51 are unreachable; the function already returns on lines 46-48. This is a copy-paste artifact from drafting the untag response shape and should be removed.

🗑️ Proposed fix

database.datasets.tag(data_id, tag, user_id=user.user_id, connection=expdb_db) return { "data_tag": {"id": str(data_id), "tag": [*tags, tag]}, } - return { - "data_untag": {"id": str(data_id)}, - }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/datasets.py` around lines 49 - 51, Remove the unreachable dead-code return that returns {"data_untag": {"id": str(data_id)}}; locate the function in src/routers/openml/datasets.py that already returns earlier (the untag response/handler) and delete the redundant return statement/block containing "data_untag" so only the intended earlier return remains; ensure no other code depends on that duplicated response shape.

coderabbitai · 2026-02-21T18:52:51Z

src/routers/openml/datasets.py

+    if tag.casefold() not in [t.casefold() for t in tags]:
+        raise HTTPException(
+            status_code=HTTPStatus.UNPROCESSABLE_ENTITY,
+            detail={"message": "Tag does not exist for this dataset"},
+        )
+
+    if user is None:
+        raise HTTPException(
+            status_code=HTTPStatus.UNAUTHORIZED,
+            detail="Authentication required",
+        )


⚠️ Potential issue | 🟠 Major

Move the authentication guard before the tag-existence check.

Currently an unauthenticated caller receives 422 when the tag is absent and 401 when it exists, leaking tag membership to anyone without credentials. Authentication should be validated first — before the get_tags_for DB call — to both prevent information disclosure and avoid the unnecessary round-trip on every unauthenticated request.

🔒️ Proposed fix

def untag_dataset( data_id: Annotated[int, Body()], tag: Annotated[str, SystemString64], user: Annotated[User | None, Depends(fetch_user)] = None, expdb_db: Annotated[Connection, Depends(expdb_connection)] = None, ) -> dict[str, dict[str, Any]]: + if user is None: + raise HTTPException( + status_code=HTTPStatus.UNAUTHORIZED, + detail="Authentication required", + ) + tags = database.datasets.get_tags_for(data_id, expdb_db) if tag.casefold() not in [t.casefold() for t in tags]: raise HTTPException( status_code=HTTPStatus.UNPROCESSABLE_ENTITY, detail={"message": "Tag does not exist for this dataset"}, ) - if user is None: - raise HTTPException( - status_code=HTTPStatus.UNAUTHORIZED, - detail="Authentication required", - ) - database.datasets.untag(

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if tag.casefold() not in [t.casefold() for t in tags]:

raise HTTPException(

status_code=HTTPStatus.UNPROCESSABLE_ENTITY,

detail={"message": "Tag does not exist for this dataset"},

)

if user is None:

raise HTTPException(

status_code=HTTPStatus.UNAUTHORIZED,

detail="Authentication required",

)

if user is None:

raise HTTPException(

status_code=HTTPStatus.UNAUTHORIZED,

detail="Authentication required",

)

tags = database.datasets.get_tags_for(data_id, expdb_db)

if tag.casefold() not in [t.casefold() for t in tags]:

raise HTTPException(

status_code=HTTPStatus.UNPROCESSABLE_ENTITY,

detail={"message": "Tag does not exist for this dataset"},

)

database.datasets.untag(

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/datasets.py` around lines 63 - 73, The code checks tag membership before verifying authentication, leaking tag existence to unauthenticated callers; move the authentication guard (the user is None check that raises HTTPStatus.UNAUTHORIZED) to run before any DB/tag checks (i.e., before calling get_tags_for and before using the tags list), so the route handler returns 401 for unauthenticated requests and only proceeds to the tag existence logic (the tag.casefold() check against tags) after user is validated.

coderabbitai · 2026-02-21T18:52:51Z

src/routers/openml/datasets.py

+    database.datasets.untag(
+        data_id,
+        tag,
+        user_id=user.user_id,
+        connection=expdb_db,
+    )


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for untag method definition in the codebase rg -n "def untag" --type py

Repository: openml/server-api

Length of output: 229

🏁 Script executed:

#!/bin/bash # Search for untag specifically in the database module rg -n "def untag" src/database/ --type py # Also check the imports in the router file to see how database.datasets is referenced head -50 src/routers/openml/datasets.py | grep -E "^import|^from"

Repository: openml/server-api

Length of output: 819

🏁 Script executed:

#!/bin/bash # List all function definitions in src/database/datasets.py rg -n "^def " src/database/datasets.py --type py

Repository: openml/server-api

Length of output: 818

Add missing untag method to src/database/datasets.py.

The code at line 75 calls database.datasets.untag(...), but this method does not exist in src/database/datasets.py. The module currently defines: get, get_file, get_tags_for, tag, get_description, get_status, get_latest_processing_update, get_features, get_feature_values, update_status, and remove_deactivated_status — but not untag. This will raise AttributeError at runtime when the endpoint executes.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/datasets.py` around lines 75 - 80, The call to database.datasets.untag(...) is failing because the untag function is missing; add an untag(data_id, tag, user_id=None, connection=None) function to src/database/datasets.py that mirrors the existing tag(...) signature and behavior but removes the association instead of creating it: locate the database.datasets module and implement untag to perform the inverse DB operation used by tag (delete the tag row for the given data_id and tag using the provided connection), handle user_id/context the same way as tag, return the same success/failure shape as tag (or raise the same errors), and ensure any cached/get_tags_for(...) behavior remains consistent after removal.

coderabbitai · 2026-02-21T18:52:51Z

src/routers/openml/datasets.py

+def tag(data_id: int, tag: str, user_id: int, connection: Connection) -> None:
+    connection.execute(
+        text(
+            """
+            INSERT INTO dataset_tag (id, tag)
+            VALUES (:data_id, :tag)
+            """,
+        ),
+        {"data_id": data_id, "tag": tag},
+    )
+
+
+def untag(data_id: int, tag: str, user_id: int, connection: Connection) -> None:
+    connection.execute(
+        text(
+            """
+            DELETE FROM dataset_tag
+            WHERE id = :data_id AND tag = :tag
+            """,
+        ),
+        {"data_id": data_id, "tag": tag},
+    )


🛠️ Refactor suggestion | 🟠 Major

Remove the dead module-level tag and untag helpers — they are never called and contain defects.

Neither function is invoked anywhere in the router file:

tag_dataset calls database.datasets.tag (line 45)

untag_dataset calls database.datasets.untag (line 75)

Both are unreachable and confirmed by static analysis (user_id is unused in each — ARG001). Beyond being dead code, the module-level tag omits the uploader column from its INSERT, which differs from database.datasets.tag and would likely violate a NOT NULL constraint if ever called.

🗑️ Proposed fix — remove both functions entirely

- - -def tag(data_id: int, tag: str, user_id: int, connection: Connection) -> None: - connection.execute( - text( - """ - INSERT INTO dataset_tag (id, tag) - VALUES (:data_id, :tag) - """, - ), - {"data_id": data_id, "tag": tag}, - ) - - -def untag(data_id: int, tag: str, user_id: int, connection: Connection) -> None: - connection.execute( - text( - """ - DELETE FROM dataset_tag - WHERE id = :data_id AND tag = :tag - """, - ), - {"data_id": data_id, "tag": tag}, - )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def tag(data_id: int, tag: str, user_id: int, connection: Connection) -> None:

connection.execute(

text(

"""

INSERT INTO dataset_tag (id, tag)

VALUES (:data_id, :tag)

""",

),

{"data_id": data_id, "tag": tag},

)

def untag(data_id: int, tag: str, user_id: int, connection: Connection) -> None:

connection.execute(

text(

"""

DELETE FROM dataset_tag

WHERE id = :data_id AND tag = :tag

""",

),

{"data_id": data_id, "tag": tag},

)

🧰 Tools

🪛 Ruff (0.15.1)

[warning] 483-483: Unused function argument: user_id

(ARG001)

[warning] 495-495: Unused function argument: user_id

(ARG001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/datasets.py` around lines 483 - 504, Remove the dead module-level functions tag and untag from this file: they are unused, have an unused user_id parameter, and the tag implementation omits the required uploader column; delete the entire function definitions named tag(data_id: int, tag: str, user_id: int, connection: Connection) and untag(data_id: int, tag: str, user_id: int, connection: Connection) so callers use the correct database.datasets.tag and database.datasets.untag implementations.

Add POST /datasets/untag endpoint

47368ea

sourcery-ai bot reviewed Feb 21, 2026

View reviewed changes

coderabbitai bot reviewed Feb 21, 2026

View reviewed changes

src/routers/openml/datasets.py Show resolved Hide resolved

Remove duplicate /datasets/untag endpoint definition

cd2a796

ishansharma08 force-pushed the feature/data-untag-endpoint branch from b68f6a6 to cd2a796 Compare February 21, 2026 18:46

[pre-commit.ci] auto fixes from pre-commit.com hooks

cfe3f16

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Feb 21, 2026

View reviewed changes

ishansharma08 closed this Feb 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add POST /datasets/untag endpoint#251

Add POST /datasets/untag endpoint#251
ishansharma08 wants to merge 3 commits intoopenml:mainfrom
ishansharma08:feature/data-untag-endpoint

ishansharma08 commented Feb 21, 2026

Uh oh!

coderabbitai bot commented Feb 21, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Feb 21, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 21, 2026

Uh oh!

coderabbitai bot Feb 21, 2026

Uh oh!

coderabbitai bot Feb 21, 2026

Uh oh!

coderabbitai bot Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ishansharma08 commented Feb 21, 2026

Uh oh!

coderabbitai bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Feb 21, 2026 •

edited

Loading