Add POST /datasets/untag endpoint#251
Conversation
WalkthroughAdds a new 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
untag_datasetendpoint is defined twice with identical implementations; remove the duplicate definition to avoid FastAPI route conflicts and maintenance issues. - The new
tag/untagdatabase functions andfrom sqlalchemy import textimport appear to be added at the bottom of the router module rather than in the database layer (e.g.,database.datasets); consider moving them to the appropriate data-access module to keep concerns separated. - In the new
untagDB function, theuser_idparameter is accepted but unused in the SQL; either use it (e.g., for auditing/ownership) or remove it to avoid confusion.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `untag_dataset` endpoint is defined twice with identical implementations; remove the duplicate definition to avoid FastAPI route conflicts and maintenance issues.
- The new `tag`/`untag` database functions and `from sqlalchemy import text` import appear to be added at the bottom of the router module rather than in the database layer (e.g., `database.datasets`); consider moving them to the appropriate data-access module to keep concerns separated.
- In the new `untag` DB function, the `user_id` parameter is accepted but unused in the SQL; either use it (e.g., for auditing/ownership) or remove it to avoid confusion.
## Individual Comments
### Comment 1
<location> `src/routers/openml/datasets.py:509-518` </location>
<code_context>
+from sqlalchemy import text
+
+
+def tag(data_id: int, tag: str, user_id: int, connection: Connection) -> None:
+ connection.execute(
+ text(
+ """
+ INSERT INTO dataset_tag (id, tag)
+ VALUES (:data_id, :tag)
+ """
+ ),
+ {"data_id": data_id, "tag": tag},
+ )
+
+
+def untag(data_id: int, tag: str, user_id: int, connection: Connection) -> None:
+ connection.execute(
+ text(
</code_context>
<issue_to_address>
**issue:** `user_id` parameter is unused in the new `tag`/`untag` helpers
`user_id` is never used in these helpers. Either incorporate it for auditing/authorization at this layer, or remove it from the function signatures to avoid a misleading, unnecessary parameter.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def tag(data_id: int, tag: str, user_id: int, connection: Connection) -> None: | ||
| connection.execute( | ||
| text( | ||
| """ | ||
| INSERT INTO dataset_tag (id, tag) | ||
| VALUES (:data_id, :tag) | ||
| """ | ||
| ), | ||
| {"data_id": data_id, "tag": tag}, | ||
| ) |
There was a problem hiding this comment.
issue: user_id parameter is unused in the new tag/untag helpers
user_id is never used in these helpers. Either incorporate it for auditing/authorization at this layer, or remove it from the function signatures to avoid a misleading, unnecessary parameter.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/routers/openml/datasets.py`:
- Around line 83-113: The file contains a duplicate FastAPI route definition for
POST /datasets/untag: two identical functions named untag_dataset decorated with
router.post(path="/untag"), which registers conflicting callables; remove the
duplicate definition (the second untag_dataset function and its decorator) so
only a single router.post(path="/untag") / untag_dataset implementation remains,
or if you intended different behavior, rename the function and route to be
unique; ensure database.datasets.untag and the dependencies (fetch_user,
expdb_connection) remain referenced by the single kept function.
b68f6a6 to
cd2a796
Compare
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/routers/openml/datasets.py`:
- Around line 49-51: Remove the unreachable dead-code return that returns
{"data_untag": {"id": str(data_id)}}; locate the function in
src/routers/openml/datasets.py that already returns earlier (the untag
response/handler) and delete the redundant return statement/block containing
"data_untag" so only the intended earlier return remains; ensure no other code
depends on that duplicated response shape.
- Around line 63-73: The code checks tag membership before verifying
authentication, leaking tag existence to unauthenticated callers; move the
authentication guard (the user is None check that raises
HTTPStatus.UNAUTHORIZED) to run before any DB/tag checks (i.e., before calling
get_tags_for and before using the tags list), so the route handler returns 401
for unauthenticated requests and only proceeds to the tag existence logic (the
tag.casefold() check against tags) after user is validated.
- Around line 483-504: Remove the dead module-level functions tag and untag from
this file: they are unused, have an unused user_id parameter, and the tag
implementation omits the required uploader column; delete the entire function
definitions named tag(data_id: int, tag: str, user_id: int, connection:
Connection) and untag(data_id: int, tag: str, user_id: int, connection:
Connection) so callers use the correct database.datasets.tag and
database.datasets.untag implementations.
- Around line 75-80: The call to database.datasets.untag(...) is failing because
the untag function is missing; add an untag(data_id, tag, user_id=None,
connection=None) function to src/database/datasets.py that mirrors the existing
tag(...) signature and behavior but removes the association instead of creating
it: locate the database.datasets module and implement untag to perform the
inverse DB operation used by tag (delete the tag row for the given data_id and
tag using the provided connection), handle user_id/context the same way as tag,
return the same success/failure shape as tag (or raise the same errors), and
ensure any cached/get_tags_for(...) behavior remains consistent after removal.
| return { | ||
| "data_untag": {"id": str(data_id)}, | ||
| } |
There was a problem hiding this comment.
Remove the unreachable return block — dead code artifact.
Lines 49-51 are unreachable; the function already returns on lines 46-48. This is a copy-paste artifact from drafting the untag response shape and should be removed.
🗑️ Proposed fix
database.datasets.tag(data_id, tag, user_id=user.user_id, connection=expdb_db)
return {
"data_tag": {"id": str(data_id), "tag": [*tags, tag]},
}
- return {
- "data_untag": {"id": str(data_id)},
- }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/routers/openml/datasets.py` around lines 49 - 51, Remove the unreachable
dead-code return that returns {"data_untag": {"id": str(data_id)}}; locate the
function in src/routers/openml/datasets.py that already returns earlier (the
untag response/handler) and delete the redundant return statement/block
containing "data_untag" so only the intended earlier return remains; ensure no
other code depends on that duplicated response shape.
| if tag.casefold() not in [t.casefold() for t in tags]: | ||
| raise HTTPException( | ||
| status_code=HTTPStatus.UNPROCESSABLE_ENTITY, | ||
| detail={"message": "Tag does not exist for this dataset"}, | ||
| ) | ||
|
|
||
| if user is None: | ||
| raise HTTPException( | ||
| status_code=HTTPStatus.UNAUTHORIZED, | ||
| detail="Authentication required", | ||
| ) |
There was a problem hiding this comment.
Move the authentication guard before the tag-existence check.
Currently an unauthenticated caller receives 422 when the tag is absent and 401 when it exists, leaking tag membership to anyone without credentials. Authentication should be validated first — before the get_tags_for DB call — to both prevent information disclosure and avoid the unnecessary round-trip on every unauthenticated request.
🔒️ Proposed fix
def untag_dataset(
data_id: Annotated[int, Body()],
tag: Annotated[str, SystemString64],
user: Annotated[User | None, Depends(fetch_user)] = None,
expdb_db: Annotated[Connection, Depends(expdb_connection)] = None,
) -> dict[str, dict[str, Any]]:
+ if user is None:
+ raise HTTPException(
+ status_code=HTTPStatus.UNAUTHORIZED,
+ detail="Authentication required",
+ )
+
tags = database.datasets.get_tags_for(data_id, expdb_db)
if tag.casefold() not in [t.casefold() for t in tags]:
raise HTTPException(
status_code=HTTPStatus.UNPROCESSABLE_ENTITY,
detail={"message": "Tag does not exist for this dataset"},
)
- if user is None:
- raise HTTPException(
- status_code=HTTPStatus.UNAUTHORIZED,
- detail="Authentication required",
- )
-
database.datasets.untag(📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if tag.casefold() not in [t.casefold() for t in tags]: | |
| raise HTTPException( | |
| status_code=HTTPStatus.UNPROCESSABLE_ENTITY, | |
| detail={"message": "Tag does not exist for this dataset"}, | |
| ) | |
| if user is None: | |
| raise HTTPException( | |
| status_code=HTTPStatus.UNAUTHORIZED, | |
| detail="Authentication required", | |
| ) | |
| if user is None: | |
| raise HTTPException( | |
| status_code=HTTPStatus.UNAUTHORIZED, | |
| detail="Authentication required", | |
| ) | |
| tags = database.datasets.get_tags_for(data_id, expdb_db) | |
| if tag.casefold() not in [t.casefold() for t in tags]: | |
| raise HTTPException( | |
| status_code=HTTPStatus.UNPROCESSABLE_ENTITY, | |
| detail={"message": "Tag does not exist for this dataset"}, | |
| ) | |
| database.datasets.untag( |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/routers/openml/datasets.py` around lines 63 - 73, The code checks tag
membership before verifying authentication, leaking tag existence to
unauthenticated callers; move the authentication guard (the user is None check
that raises HTTPStatus.UNAUTHORIZED) to run before any DB/tag checks (i.e.,
before calling get_tags_for and before using the tags list), so the route
handler returns 401 for unauthenticated requests and only proceeds to the tag
existence logic (the tag.casefold() check against tags) after user is validated.
| database.datasets.untag( | ||
| data_id, | ||
| tag, | ||
| user_id=user.user_id, | ||
| connection=expdb_db, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for untag method definition in the codebase
rg -n "def untag" --type pyRepository: openml/server-api
Length of output: 229
🏁 Script executed:
#!/bin/bash
# Search for untag specifically in the database module
rg -n "def untag" src/database/ --type py
# Also check the imports in the router file to see how database.datasets is referenced
head -50 src/routers/openml/datasets.py | grep -E "^import|^from"Repository: openml/server-api
Length of output: 819
🏁 Script executed:
#!/bin/bash
# List all function definitions in src/database/datasets.py
rg -n "^def " src/database/datasets.py --type pyRepository: openml/server-api
Length of output: 818
Add missing untag method to src/database/datasets.py.
The code at line 75 calls database.datasets.untag(...), but this method does not exist in src/database/datasets.py. The module currently defines: get, get_file, get_tags_for, tag, get_description, get_status, get_latest_processing_update, get_features, get_feature_values, update_status, and remove_deactivated_status — but not untag. This will raise AttributeError at runtime when the endpoint executes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/routers/openml/datasets.py` around lines 75 - 80, The call to
database.datasets.untag(...) is failing because the untag function is missing;
add an untag(data_id, tag, user_id=None, connection=None) function to
src/database/datasets.py that mirrors the existing tag(...) signature and
behavior but removes the association instead of creating it: locate the
database.datasets module and implement untag to perform the inverse DB operation
used by tag (delete the tag row for the given data_id and tag using the provided
connection), handle user_id/context the same way as tag, return the same
success/failure shape as tag (or raise the same errors), and ensure any
cached/get_tags_for(...) behavior remains consistent after removal.
| def tag(data_id: int, tag: str, user_id: int, connection: Connection) -> None: | ||
| connection.execute( | ||
| text( | ||
| """ | ||
| INSERT INTO dataset_tag (id, tag) | ||
| VALUES (:data_id, :tag) | ||
| """, | ||
| ), | ||
| {"data_id": data_id, "tag": tag}, | ||
| ) | ||
|
|
||
|
|
||
| def untag(data_id: int, tag: str, user_id: int, connection: Connection) -> None: | ||
| connection.execute( | ||
| text( | ||
| """ | ||
| DELETE FROM dataset_tag | ||
| WHERE id = :data_id AND tag = :tag | ||
| """, | ||
| ), | ||
| {"data_id": data_id, "tag": tag}, | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Remove the dead module-level tag and untag helpers — they are never called and contain defects.
Neither function is invoked anywhere in the router file:
tag_datasetcallsdatabase.datasets.tag(line 45)untag_datasetcallsdatabase.datasets.untag(line 75)
Both are unreachable and confirmed by static analysis (user_id is unused in each — ARG001). Beyond being dead code, the module-level tag omits the uploader column from its INSERT, which differs from database.datasets.tag and would likely violate a NOT NULL constraint if ever called.
🗑️ Proposed fix — remove both functions entirely
-
-
-def tag(data_id: int, tag: str, user_id: int, connection: Connection) -> None:
- connection.execute(
- text(
- """
- INSERT INTO dataset_tag (id, tag)
- VALUES (:data_id, :tag)
- """,
- ),
- {"data_id": data_id, "tag": tag},
- )
-
-
-def untag(data_id: int, tag: str, user_id: int, connection: Connection) -> None:
- connection.execute(
- text(
- """
- DELETE FROM dataset_tag
- WHERE id = :data_id AND tag = :tag
- """,
- ),
- {"data_id": data_id, "tag": tag},
- )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def tag(data_id: int, tag: str, user_id: int, connection: Connection) -> None: | |
| connection.execute( | |
| text( | |
| """ | |
| INSERT INTO dataset_tag (id, tag) | |
| VALUES (:data_id, :tag) | |
| """, | |
| ), | |
| {"data_id": data_id, "tag": tag}, | |
| ) | |
| def untag(data_id: int, tag: str, user_id: int, connection: Connection) -> None: | |
| connection.execute( | |
| text( | |
| """ | |
| DELETE FROM dataset_tag | |
| WHERE id = :data_id AND tag = :tag | |
| """, | |
| ), | |
| {"data_id": data_id, "tag": tag}, | |
| ) |
🧰 Tools
🪛 Ruff (0.15.1)
[warning] 483-483: Unused function argument: user_id
(ARG001)
[warning] 495-495: Unused function argument: user_id
(ARG001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/routers/openml/datasets.py` around lines 483 - 504, Remove the dead
module-level functions tag and untag from this file: they are unused, have an
unused user_id parameter, and the tag implementation omits the required uploader
column; delete the entire function definitions named tag(data_id: int, tag: str,
user_id: int, connection: Connection) and untag(data_id: int, tag: str, user_id:
int, connection: Connection) so callers use the correct database.datasets.tag
and database.datasets.untag implementations.
Implements the POST /datasets/untag endpoint as part of the legacy API migration.
This endpoint removes a tag from a dataset and returns the dataset ID on success. It returns 422 Unprocessable Entity when the tag does not exist for the dataset (replacing the legacy 412 behavior), and 401 when unauthenticated.
The implementation follows the structure of the existing /datasets/tag endpoint for consistency.
Tested locally using Docker Compose with full environment:
Please let me know if any adjustments are needed.