Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 34 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -56,19 +56,49 @@ line-length = 100
select = ["ALL"]
ignore = [
"CPY", # we do not require copyright in every file
"D", # todo: docstring linting
"D203",
"D204",
"D213",
"DTZ", # To add
# Linter does not detect when types are used for Pydantic
"TC001",
"TC003",
]

[tool.ruff.lint.per-file-ignores]
"tests/*" = [ "S101", "COM812", "D"]
"src/core/conversions.py" = ["ANN401"]
"tests/*" = ["S101", "COM812", "D"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

fd --type f --extension py . tests/

Repository: openml/server-api

Length of output: 991


Change glob pattern to recursively match nested test files.

The pattern "tests/*" only matches files directly in the tests/ directory. However, the repository contains nested test files at multiple levels (e.g., tests/routers/openml/study_test.py, tests/database/flows_test.py, tests/routers/openml/migration/datasets_migration_test.py). The D (docstring) rules will incorrectly fire on these nested files instead of being suppressed. Use "tests/**" for recursive coverage.

🔧 Proposed fix
-"tests/*" = ["S101", "COM812", "D"]
+"tests/**" = ["S101", "COM812", "D"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"tests/*" = ["S101", "COM812", "D"]
"tests/**" = ["S101", "COM812", "D"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 68, Update the glob used to suppress rules for test
files: replace the non-recursive pattern "tests/*" with the recursive pattern
"tests/**" in the pyproject TOML setting that maps test globs to
["S101","COM812","D"] so nested test files (e.g., tests/routers/... and
tests/database/...) are properly matched and the D rule is suppressed for all
test subdirectories.

"src/config.py" = ["D100", "D101", "D102", "D103"]
"src/core/access.py" = ["D100", "D101", "D102", "D103"]
"src/core/conversions.py" = ["ANN401", "D100", "D101", "D102", "D103"]
"src/core/errors.py" = ["D100", "D101", "D102", "D103"]
"src/core/formatting.py" = ["D100", "D101", "D102", "D103"]
"src/database/datasets.py" = ["D100", "D101", "D102", "D103"]
"src/database/evaluations.py" = ["D100", "D101", "D102", "D103"]
"src/database/flows.py" = ["D100", "D101", "D102", "D103"]
"src/database/qualities.py" = ["D100", "D101", "D102", "D103"]
"src/database/setup.py" = ["D100", "D101", "D102", "D103"]
"src/database/studies.py" = ["D100", "D101", "D102", "D103"]
"src/database/tasks.py" = ["D100", "D101", "D102", "D103"]
"src/database/users.py" = ["D100", "D101", "D102", "D103"]
"src/main.py" = ["D100", "D101", "D102", "D103"]
"src/routers/dependencies.py" = ["D100", "D101", "D102", "D103"]
"src/routers/mldcat_ap/dataset.py" = ["D100", "D101", "D102", "D103"]
"src/routers/openml/datasets.py" = ["D100", "D101", "D102", "D103"]
"src/routers/openml/estimation_procedure.py" = ["D100", "D101", "D102", "D103"]
"src/routers/openml/evaluations.py" = ["D100", "D101", "D102", "D103"]
"src/routers/openml/flows.py" = ["D100", "D101", "D102", "D103"]
"src/routers/openml/qualities.py" = ["D100", "D101", "D102", "D103"]
"src/routers/openml/study.py" = ["D100", "D101", "D102", "D103"]
"src/routers/openml/tasks.py" = ["D100", "D101", "D102", "D103"]
"src/routers/openml/tasktype.py" = ["D100", "D101", "D102", "D103"]
"src/routers/types.py" = ["D100", "D101", "D102", "D103"]
"src/schemas/core.py" = ["D100", "D101", "D102", "D103"]
"src/schemas/datasets/__init__.py" = ["D100", "D101", "D102", "D103"]
"src/schemas/datasets/convertor.py" = ["D100", "D101", "D102", "D103"]
"src/schemas/datasets/dcat.py" = ["D100", "D101", "D102", "D103"]
"src/schemas/datasets/mldcat_ap.py" = ["D100", "D101", "D102", "D103"]
"src/schemas/datasets/openml.py" = ["D100", "D101", "D102", "D103"]
"src/schemas/flows.py" = ["D100", "D101", "D102", "D103"]
"src/schemas/study.py" = ["D100", "D101", "D102", "D103"]

[tool.mypy]
strict = true
Expand Down
1 change: 1 addition & 0 deletions src/core/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Core functionality for the OpenML server API."""
14 changes: 9 additions & 5 deletions src/core/conversions.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@


def _str_to_num(string: str) -> int | float | str:
"""Tries to convert the string to integer, otherwise float, otherwise returns the input."""
"""Try to convert the string to integer, otherwise float, otherwise returns the input."""
if string.isdigit():
return int(string)
try:
Expand All @@ -13,8 +13,10 @@ def _str_to_num(string: str) -> int | float | str:


def nested_str_to_num(obj: Any) -> Any:
"""Recursively tries to convert all strings in the object to numbers.
For dictionaries, only the values will be converted."""
"""Recursively try to convert all strings in the object to numbers.

For dictionaries, only the values will be converted.
"""
Comment on lines 15 to +19
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Docstring for nested_str_to_num under-describes behavior for non-dict containers.

Please also document how non-dict containers are handled (e.g., that lists/tuples/sets are fully traversed and all string elements are converted where possible) so the helper’s behavior is clear at call sites.

Suggested change
def nested_str_to_num(obj: Any) -> Any:
"""Recursively tries to convert all strings in the object to numbers.
For dictionaries, only the values will be converted."""
"""Recursively try to convert all strings in the object to numbers.
For dictionaries, only the values will be converted.
"""
def nested_str_to_num(obj: Any) -> Any:
"""Recursively try to convert all strings in the object to numbers.
For dictionaries, only the values will be converted.
For non-dict containers (e.g., lists, tuples, sets), all elements are
recursively traversed and any string elements are converted to numbers
where possible, preserving the original container type and structure.
"""

if isinstance(obj, str):
return _str_to_num(obj)
if isinstance(obj, Mapping):
Expand All @@ -25,8 +27,10 @@ def nested_str_to_num(obj: Any) -> Any:


def nested_num_to_str(obj: Any) -> Any:
"""Recursively tries to convert all numbers in the object to strings.
For dictionaries, only the values will be converted."""
"""Recursively try to convert all numbers in the object to strings.

For dictionaries, only the values will be converted.
"""
if isinstance(obj, str):
return obj
if isinstance(obj, Mapping):
Expand Down
1 change: 1 addition & 0 deletions src/database/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Database access for the OpenML server API."""
4 changes: 2 additions & 2 deletions src/database/datasets.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Translation from https://github.com/openml/OpenML/blob/c19c9b99568c0fabb001e639ff6724b9a754bbc9/openml_OS/models/api/v1/Api_data.php#L707"""
"""Translation from https://github.com/openml/OpenML/blob/c19c9b99568c0fabb001e639ff6724b9a754bbc9/openml_OS/models/api/v1/Api_data.php#L707."""

import datetime

Expand Down Expand Up @@ -162,7 +162,7 @@ def update_status(
parameters={
"dataset": dataset_id,
"status": status,
"date": datetime.datetime.now(),
"date": datetime.datetime.now(datetime.UTC),
"user": user_id,
},
)
Expand Down
2 changes: 1 addition & 1 deletion src/database/flows.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def get_parameters(flow_id: int, expdb: Connection) -> Sequence[Row]:


def get_by_name(name: str, external_version: str, expdb: Connection) -> Row | None:
"""Gets flow by name and external version."""
"""Get flow by name and external version."""
return expdb.execute(
text(
"""
Expand Down
4 changes: 2 additions & 2 deletions src/database/studies.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import re
from collections.abc import Sequence
from datetime import datetime
from datetime import UTC, datetime
from typing import cast

from sqlalchemy import Connection, Row, text
Expand Down Expand Up @@ -98,7 +98,7 @@ def create(study: CreateStudy, user: User, expdb: Connection) -> int:
"main_entity_type": study.main_entity_type,
"description": study.description,
"creator": user.user_id,
"creation_date": datetime.now(),
"creation_date": datetime.now(UTC),
"benchmark_suite": study.benchmark_suite,
},
)
Expand Down
1 change: 1 addition & 0 deletions src/routers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""API routers for the OpenML server API."""
1 change: 1 addition & 0 deletions src/routers/mldcat_ap/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Routers for the MLDCAT-AP API."""
2 changes: 1 addition & 1 deletion src/routers/mldcat_ap/dataset.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Router for MLDCAT-AP endpoints: https://semiceu.github.io/MLDCAT-AP/releases/1.0.0/#examples
"""Router for MLDCAT-AP endpoints: https://semiceu.github.io/MLDCAT-AP/releases/1.0.0/#examples.

Incredibly inefficient, but it's just a proof of concept.
Specific queries could be written to fetch e.g., a single feature or quality.
Expand Down
1 change: 1 addition & 0 deletions src/routers/openml/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Routers for the OpenML API."""
2 changes: 1 addition & 1 deletion src/routers/openml/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ def _get_dataset_raise_otherwise(
user: User | None,
expdb: Connection,
) -> Row:
"""Fetches the dataset from the database if it exists and the user has permissions.
"""Fetch the dataset from the database if it exists and the user has permissions.

Raises HTTPException if the dataset does not exist or the user can not access it.
"""
Expand Down
5 changes: 3 additions & 2 deletions src/routers/openml/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,9 @@ def fill_template(
task_inputs: dict[str, str | int],
connection: Connection,
) -> dict[str, JSON]:
"""Fill in the XML template as used for task descriptions and return the result,
converted to JSON.
"""Fill in the XML template as used for task descriptions and return the result.

The result is converted to JSON.

template, str:
A string represent XML, as detailed below.
Expand Down
1 change: 1 addition & 0 deletions src/schemas/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Pydantic schemas for the OpenML server API."""
2 changes: 2 additions & 0 deletions src/schemas/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
"""Dataset schemas for the OpenML server API."""

from enum import StrEnum


Expand Down
9 changes: 5 additions & 4 deletions src/schemas/datasets/dcat.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
"""
"""DCAT-AP schema definitions.

This file is copied over from an external source.
Original Author: Jos van der Velde
Source: https://github.com/aiondemand/AIOD-rest-api/blob/develop/src/converters/schema/dcat.py
License: MIT
License: MIT.

Based on DCAT Application Profile for data portals in Europe Version 2.1.1

Expand Down Expand Up @@ -36,7 +37,7 @@ class DcatAPContext(BaseModel):


class DcatAPObject(BaseModel, ABC):
"""Base class for all DCAT-AP objects"""
"""Base class for all DCAT-AP objects."""

id_: str = Field(serialization_alias="@id")

Expand Down Expand Up @@ -198,7 +199,7 @@ class DcatAPDataset(DcatAPObject):


class DcatApWrapper(BaseModel):
"""The resulting class, containing a dataset and related entities in the graph"""
"""The resulting class, containing a dataset and related entities in the graph."""

context_: DcatAPContext = Field(
default=DcatAPContext(),
Expand Down
21 changes: 12 additions & 9 deletions src/schemas/datasets/mldcat_ap.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""
Based on MLDCAT-AP 1.0.0: https://semiceu.github.io/MLDCAT-AP/releases/1.0.0/
"""MLDCAT-AP schema definitions based on MLDCAT-AP 1.0.0.

See: https://semiceu.github.io/MLDCAT-AP/releases/1.0.0/

This is an application profile, aimed to extend the use of DCAT-AP,
originally envisaged for the description of a machine learning process,
Expand All @@ -18,7 +19,7 @@


class JsonLDQualifiedLiteral(BaseModel):
"""Base class for all JSON-LD objects"""
"""Base class for all JSON-LD objects."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

JsonLDQualifiedLiteral has the wrong docstring — copied from JsonLDObject.

JsonLDQualifiedLiteral is a concrete value model (two fields: type_ and value) and is not a base class. The docstring "Base class for all JSON-LD objects." belongs to JsonLDObject (the actual ABC at line 34), and appears to have been copy-pasted here.

📝 Proposed fix
-    """Base class for all JSON-LD objects."""
+    """A JSON-LD qualified literal pairing an explicit type with a string value."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""Base class for all JSON-LD objects."""
"""A JSON-LD qualified literal pairing an explicit type with a string value."""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/schemas/datasets/mldcat_ap.py` at line 22, The docstring on
JsonLDQualifiedLiteral is incorrect — it was copied from JsonLDObject; update
the docstring for the JsonLDQualifiedLiteral class to describe that it is a
concrete JSON-LD literal model with fields type_ and value (e.g., "Concrete
JSON-LD literal with fields `type_` and `value`"), and ensure JsonLDObject
retains the "Base class for all JSON-LD objects." description so the two classes
have accurate, distinct docstrings.

⚠️ Potential issue | 🟡 Minor

Incorrect docstring copy-pasted from JsonLDObject.

JsonLDQualifiedLiteral is a concrete value model (two fields: type_ and value); it is not a base class. The docstring "Base class for all JSON-LD objects." was lifted verbatim from JsonLDObject (line 34), which is the actual ABC. This will mislead any reader of the API docs.

📝 Proposed fix
-    """Base class for all JSON-LD objects."""
+    """A JSON-LD qualified literal with an explicit type and value."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""Base class for all JSON-LD objects."""
"""A JSON-LD qualified literal with an explicit type and value."""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/schemas/datasets/mldcat_ap.py` at line 22, The docstring on class
JsonLDQualifiedLiteral is incorrect (it was copied from JsonLDObject); update
the JsonLDQualifiedLiteral docstring to accurately describe that it is a
concrete value model with two fields (type_ and value) rather than a base
class—mention the purpose/usage of the model and the fields so API docs are not
misleading, and ensure JsonLDObject remains documented as the abstract base
class for JSON-LD objects.


type_: str = Field(serialization_alias="@type")
value: str = Field(serialization_alias="@value")
Expand All @@ -30,7 +31,7 @@ class JsonLDQualifiedLiteral(BaseModel):


class JsonLDObject(BaseModel, ABC):
"""Base class for all JSON-LD objects"""
"""Base class for all JSON-LD objects."""

id_: str = Field(serialization_alias="@id")
type_: str = Field(serialization_alias="@type")
Expand All @@ -48,7 +49,7 @@ class JsonLDObjectReference[T: JsonLDObject](BaseModel):

@classmethod
def to(cls, json_ld_object: T) -> JsonLDObjectReference[T]:
"""Create a reference to `json_ld_object`"""
"""Create a reference to `json_ld_object`."""
return cls(id_=json_ld_object.id_)

@model_serializer
Expand All @@ -57,7 +58,7 @@ def ser_model(self) -> str:


class AccessRights(StrEnum):
"""Recommend values for 'access rights' within DCAT-AP context"""
"""Recommend values for 'access rights' within DCAT-AP context."""

# https://op.europa.eu/en/web/eu-vocabularies/dataset/-/resource?uri=http://publications.europa.eu/resource/dataset/access-right
PUBLIC = "PUBLIC"
Expand All @@ -66,9 +67,10 @@ class AccessRights(StrEnum):


class Agent(JsonLDObject):
"""Any entity carrying out actions with respect to the (Core) entities Catalogue,
Datasets, Data Services and Distributions. If the Agent is an organisation,
the use of the Organization Ontology is recommended.
"""Any entity carrying out actions with respect to the (Core) entities.

Catalogue, Datasets, Data Services and Distributions. If the Agent is an
organisation, the use of the Organization Ontology is recommended.
"""
Comment on lines 69 to 74
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (typo): Agent docstring reads as two separate sentences where the second is a fragment.

Consider reflowing to something like: Any entity carrying out actions with respect to the (Core) entities Catalogue, Datasets, Data Services and Distributions. to keep the description grammatically consistent while still incorporating the additional detail.

Suggested change
class Agent(JsonLDObject):
"""Any entity carrying out actions with respect to the (Core) entities Catalogue,
Datasets, Data Services and Distributions. If the Agent is an organisation,
the use of the Organization Ontology is recommended.
"""Any entity carrying out actions with respect to the (Core) entities.
Catalogue, Datasets, Data Services and Distributions. If the Agent is an
organisation, the use of the Organization Ontology is recommended.
"""
class Agent(JsonLDObject):
+ """Any entity carrying out actions with respect to the (Core) entities Catalogue,
+ Datasets, Data Services and Distributions. If the Agent is an organisation, the
+ use of the Organization Ontology is recommended.
+ """


type_: Literal["Agent"] = Field(default="Agent", serialization_alias="@type")
Expand All @@ -81,6 +83,7 @@ class Agent(JsonLDObject):

class MD5Checksum(JsonLDObject):
"""A value that allows the contents of a file to be authenticated.

This class allows the results of a variety of checksum and cryptographic
message digest algorithms to be represented.
"""
Expand Down
6 changes: 3 additions & 3 deletions src/schemas/datasets/openml.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from __future__ import annotations

from datetime import datetime
from datetime import UTC, datetime
from enum import StrEnum
from typing import Any

Expand Down Expand Up @@ -91,10 +91,10 @@ class DatasetMetadata(BaseModel):
},
)
upload_date: datetime = Field(
json_schema_extra={"example": str(datetime(2014, 4, 6, 23, 12, 20))},
json_schema_extra={"example": str(datetime(2014, 4, 6, 23, 12, 20, tzinfo=UTC))},
)
processing_date: datetime | None = Field(
json_schema_extra={"example": str(datetime(2019, 7, 9, 15, 22, 3))},
json_schema_extra={"example": str(datetime(2019, 7, 9, 15, 22, 3, tzinfo=UTC))},
)
processing_error: str | None = Field(
json_schema_extra={"example": "Please provide description XML."},
Expand Down
9 changes: 3 additions & 6 deletions tests/routers/openml/study_test.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from datetime import datetime
from datetime import UTC, datetime
from http import HTTPStatus

import httpx
Expand Down Expand Up @@ -494,11 +494,8 @@ def test_create_task_study(py_api: TestClient) -> None:
}
new_study = study.json()

creation_date = datetime.strptime(
new_study.pop("creation_date"),
"%Y-%m-%dT%H:%M:%S",
)
assert creation_date.date() == datetime.now().date()
creation_date = datetime.fromisoformat(new_study.pop("creation_date"))
assert creation_date.date() == datetime.now(UTC).date()
Comment on lines +497 to +498
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add an explicit assertion that creation_date is timezone-aware and in UTC to prove the DTZ-related change.

Since this change is about enforcing timezone-aware datetimes, this regression test should also assert that the returned value is explicitly timezone-aware and in UTC, not just that the date matches. For example:

creation_date = datetime.fromisoformat(new_study.pop("creation_date"))
assert creation_date.tzinfo is UTC
assert creation_date.date() == datetime.now(UTC).date()

This ensures the API contract (UTC-aware datetime) is verified, rather than only the calendar date.

Suggested implementation:

from datetime import datetime, UTC

import httpx
    }
    new_study = study.json()
    new_study = study.json()

    creation_date = datetime.fromisoformat(new_study.pop("creation_date"))
    assert creation_date.tzinfo is UTC
    assert creation_date.date() == datetime.now(UTC).date()

Comment on lines +497 to +498
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n tests/routers/openml/study_test.py | sed -n '490,510p'

Repository: openml/server-api

Length of output: 795


🏁 Script executed:

# Find Pydantic models related to Study
rg -t py "class.*Study" --max-count=20

Repository: openml/server-api

Length of output: 252


🏁 Script executed:

# Look for any custom datetime serializers or Pydantic config
rg -t py "json_encoders|ConfigDict|model_config|datetime" tests/routers/openml/study_test.py -A 2 -B 2

Repository: openml/server-api

Length of output: 326


🌐 Web query:

Pydantic v2 default datetime serialization with timezone UTC

💡 Result:

In Pydantic v2, datetime values are serialized to JSON as strings, and the default temporal JSON format is ISO 8601 (ser_json_temporal='iso8601'). It does not automatically convert datetimes to UTC; whatever tzinfo is on the value is what gets serialized (as an offset like +02:30, +00:00, etc.). [1][2]

What you get by default (UTC-aware)

If your value is already UTC-aware (tzinfo=timezone.utc), the default JSON output will be ISO-8601 with an offset (typically +00:00). [2]

Enforce “UTC (and optionally Z)” on output

If you want to always output UTC, convert to UTC in a serializer; if you specifically want a trailing Z, format it yourself:

from datetime import datetime, timezone
from pydantic import BaseModel, field_serializer

class M(BaseModel):
    dt: datetime

    `@field_serializer`('dt')
    def ser_dt(self, v: datetime, _info):
        v = v.astimezone(timezone.utc)
        return v.strftime('%Y-%m-%dT%H:%M:%S.%fZ').rstrip('0').replace('.Z', 'Z')

Pydantic v2 supports this via @field_serializer (and other serializer hooks). [3]
(Example pattern for Z formatting in v2 is also commonly done with PlainSerializer/Annotated.) [4]

Sources:
[1] Pydantic config: ser_json_temporal default and options. (docs.pydantic.dev)
[2] Pydantic standard library types: datetime JSON serialization and example with timezone offset. (docs.pydantic.dev)
[3] Pydantic serialization docs: custom serializers (@field_serializer, etc.). (docs.pydantic.dev)
[4] Example approach using v2 Annotated + PlainSerializer to emit Z. (stackoverflow.com)

Citations:


🏁 Script executed:

cat -n src/schemas/study.py | head -100

Repository: openml/server-api

Length of output: 3708


🏁 Script executed:

# Find the creation_date field in Study model
rg -t py "creation_date" src/schemas/study.py -B 5 -A 2

Repository: openml/server-api

Length of output: 255


🏁 Script executed:

# Check where creation_date is set in the codebase
rg -t py "creation_date" --max-count=30 -B 3 -A 3

Repository: openml/server-api

Length of output: 3908


🏁 Script executed:

# Look for the creation of Study objects, particularly in routers or models
rg -t py "Study\(" src/ -B 2 -A 5 | head -100

Repository: openml/server-api

Length of output: 1375


🏁 Script executed:

# Check if there are any custom serializers or model config in Study
cat -n src/schemas/study.py

Repository: openml/server-api

Length of output: 3708


🏁 Script executed:

# Check if there are any custom serializers in the Study model or BaseModel config
rg -t py "field_serializer|ConfigDict|model_config" src/schemas/study.py

Repository: openml/server-api

Length of output: 43


🏁 Script executed:

# Check the full router code to see how Study is instantiated
cat -n src/routers/openml/study.py | grep -A 20 "return Study("

Repository: openml/server-api

Length of output: 928


🏁 Script executed:

# Verify the imports and check if UTC is being used consistently
cat -n src/database/studies.py | head -50

Repository: openml/server-api

Length of output: 1782


🏁 Script executed:

# Check if there are any custom field serializers in any base model or parent
rg -t py "field_serializer" src/schemas/

Repository: openml/server-api

Length of output: 313


🏁 Script executed:

# Look at how the creation_date data flows - check the test more closely
cat -n tests/routers/openml/study_test.py | sed -n '470,510p'

Repository: openml/server-api

Length of output: 1516


🏁 Script executed:

# Check if there's a BaseModel with custom datetime serialization
cat -n src/schemas/core.py

Repository: openml/server-api

Length of output: 203


🏁 Script executed:

# Search for any Pydantic config that might affect serialization
rg -t py "class BaseModel|ConfigDict|model_config|ser_json_temporal" src/schemas/

Repository: openml/server-api

Length of output: 846


🏁 Script executed:

# Let me check the full test to understand the test data better
cat -n tests/routers/openml/study_test.py | sed -n '450,480p'

Repository: openml/server-api

Length of output: 1096


🏁 Script executed:

# Check if there's a custom datetime serializer in the fixtures or anywhere
rg -t py "json_encoders|serialize.*datetime" tests/

Repository: openml/server-api

Length of output: 43


🏁 Script executed:

# Check how the datetime is actually being serialized - look at actual test execution context
cat -n tests/routers/openml/study_test.py | sed -n '1,50p'

Repository: openml/server-api

Length of output: 1516


🏁 Script executed:

# Check if Study model uses any parent class or has config we missed
rg -t py "class Study" src/schemas/ -A 20

Repository: openml/server-api

Length of output: 1454


🏁 Script executed:

# Check what the actual datetime format looks like in the test fixture database
rg -t sql "2019-02-25" tests/ src/

Repository: openml/server-api

Length of output: 43


🏁 Script executed:

# Look for any datetime handling in the conftest or test setup
find tests -name "conftest.py" -o -name "fixtures*" | head -5

Repository: openml/server-api

Length of output: 78


🏁 Script executed:

# Check if there are database snapshots or fixtures with actual creation_date values
find tests -type f -name "*.py" -path "*fixture*" -o -name "conftest.py" | xargs grep -l "creation_date" 2>/dev/null

Repository: openml/server-api

Length of output: 45


🏁 Script executed:

# Let's check the test database setup to understand the actual stored values
cat -n tests/conftest.py | head -100

Repository: openml/server-api

Length of output: 3562


🏁 Script executed:

# Check how the Study model is being converted to JSON in the test
# Look for any custom JSON encoding or serialization
rg -t py "\.json()" tests/routers/openml/study_test.py -B 3 -A 3

Repository: openml/server-api

Length of output: 1507


🏁 Script executed:

# Check if TestClient.json() method behaves differently
# Let's look at actual test database to see what creation_date values exist
cat -n tests/resources/sql/ 2>/dev/null | head -20 || find tests/resources -name "*.sql" -o -name "*.json" | head -10

Repository: openml/server-api

Length of output: 43


🏁 Script executed:

# Check if there's database initialization with actual datetime values
find tests/resources -type f | head -20

Repository: openml/server-api

Length of output: 102


🏁 Script executed:

# Look for any database migration or initialization scripts
rg -t sql "creation_date" tests/ src/database/

Repository: openml/server-api

Length of output: 43


🏁 Script executed:

# Let me check the actual database schema for the study table
rg -t sql "CREATE TABLE.*study" src/database/

Repository: openml/server-api

Length of output: 43


🏁 Script executed:

# Check how the datetime is retrieved from the database
cat -n src/database/studies.py | sed -n '1,100p'

Repository: openml/server-api

Length of output: 3834


🏁 Script executed:

# Let's trace the actual data: check what the test fixture shows
cat -n tests/routers/openml/study_test.py | sed -n '10,80p'

Repository: openml/server-api

Length of output: 1914


🏁 Script executed:

# Check if there's a custom datetime deserializer or handler in the routers
cat -n src/routers/openml/study.py | head -50

Repository: openml/server-api

Length of output: 2186


🏁 Script executed:

# Check the actual type of creation_date coming from the database
# Look at the database schema definition
find src -name "*.py" -type f | xargs grep -l "CREATE TABLE\|alembic\|migration" | head -5

Repository: openml/server-api

Length of output: 43


🏁 Script executed:

# Check if SQLAlchemy has any type conversion for datetime
rg -t py "DateTime|DATETIME" src/database/ -B 2 -A 2

Repository: openml/server-api

Length of output: 43


🏁 Script executed:

# Let's look at what the test database actually contains
cat -n tests/resources/sql/ 2>/dev/null || find tests -name "*.sql" -type f | head -10

Repository: openml/server-api

Length of output: 43


🏁 Script executed:

# Check the database initialization/fixture setup
rg -t py "INSERT INTO study" tests/ src/database/ -B 3 -A 3

Repository: openml/server-api

Length of output: 447


🏁 Script executed:

# Critical check: let's see the actual test expectation and compare
# If the test expects "2019-02-25T17:15:01" without offset, but Pydantic defaults to adding it,
# then the first test (line 457) would fail. Let's check if that test actually passes
cat -n tests/routers/openml/study_test.py | sed -n '12,60p'

Repository: openml/server-api

Length of output: 1438


🏁 Script executed:

# Check if there's any datetime handling or type conversion happening
# Look at how SQLAlchemy is configured or if there's custom type handling
rg -t py "python_type|bind_processor|result_processor" src/

Repository: openml/server-api

Length of output: 43


🏁 Script executed:

# Let me check the actual database column definition for creation_date
# Look for schema or migration files
find src -name "*.toml" -o -name "*.yaml" -o -name "*.yml" | head -10

Repository: openml/server-api

Length of output: 76


🏁 Script executed:

# Check if there's a type stub or database schema definition
rg -t py "datetime|tzinfo" src/database/studies.py -B 1 -A 1 | head -30

Repository: openml/server-api

Length of output: 298


🏁 Script executed:

# Critical: Check what the database actually stores and retrieves
# Look at how SQLAlchemy would interpret the datetime from the database
cat -n src/database/setup.py 2>/dev/null || find src -name "setup.py" -o -name "database.py" | head -5

Repository: openml/server-api

Length of output: 1145


🏁 Script executed:

# Check the database connection setup
rg -t py "create_engine\|Engine" src/database/ -B 2 -A 5

Repository: openml/server-api

Length of output: 43


🏁 Script executed:

# Let me verify: when creation_date is stored as datetime.now(UTC) in the database,
# what type does SQLAlchemy retrieve it as? Check the conftest to see actual test
cat -n tests/routers/openml/study_test.py | sed -n '495,500p'

Repository: openml/server-api

Length of output: 301


🏁 Script executed:

# The key question: does the test database fixture have timezone info?
# Let's check what's in the database by looking for where fixtures are populated
rg -t py "datetime\.now\(UTC\)|timezone" tests/routers/openml/study_test.py

Repository: openml/server-api

Length of output: 120


🏁 Script executed:

# Key question: Is the creation_date coming from DB naive or aware?
# When SQLAlchemy retrieves datetime from MySQL DATETIME column, it's naive
# Let's check if there's any tzinfo conversion happening
rg -t py "tzinfo\|replace\(tzinfo\|astimezone" src/routers/openml/study.py src/database/studies.py

Repository: openml/server-api

Length of output: 43


🏁 Script executed:

# Let's check: when the test calls study.json(), what does Pydantic actually produce?
# We need to understand if the database value is naive or aware when passed to Study
cat -n src/routers/openml/study.py | sed -n '130,155p'

Repository: openml/server-api

Length of output: 1164


🏁 Script executed:

# Check: Is there any custom serializer or configuration on the Study model?
rg -t py "model_config\|ConfigDict\|json_encoders" src/schemas/study.py

Repository: openml/server-api

Length of output: 43


🏁 Script executed:

# The critical piece: let's see if datetime.now(UTC) actually produces an aware datetime
# and check if Pydantic would serialize it differently
python3 << 'EOF'
from datetime import UTC, datetime
from pydantic import BaseModel

class TestModel(BaseModel):
    dt: datetime

# Test with aware datetime
aware_dt = datetime.now(UTC)
m_aware = TestModel(dt=aware_dt)
print("Aware datetime JSON:", m_aware.model_dump_json())

# Test with naive datetime  
naive_dt = datetime.now()
m_naive = TestModel(dt=naive_dt)
print("Naive datetime JSON:", m_naive.model_dump_json())
EOF

Repository: openml/server-api

Length of output: 175


The test is safe from flakiness but has a subtle semantic issue with timezone awareness.

Pydantic v2 does serialize timezone-aware datetimes with a timezone indicator (e.g., Z for UTC). However, the actual issue here is upstream: creation_date is stored in the database as a naive DATETIME column. When SQLAlchemy retrieves it, it returns a naive Python datetime object, so Pydantic serializes it without any timezone offset (e.g., "2019-02-25T17:15:01").

The test call creation_date.date() == datetime.now(UTC).date() is safe because .date() strips timezone information from both sides, resulting in a comparison of two naive date objects. This won't be flaky even at midnight UTC.

However, the underlying code has a semantic gap: datetime.now(UTC) (an aware datetime) is written to a naive database column, losing its UTC context. While the test works correctly by comparing dates, consider whether creation_date should preserve timezone information throughout the stack—either by using a timezone-aware database column or by explicitly managing UTC context.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/study_test.py` around lines 497 - 498, The test reveals
a timezone-awareness gap: creation_date is stored as a naive DATETIME and
returned naive via SQLAlchemy, then parsed with datetime.fromisoformat in the
test; fix by preserving UTC context either at storage or on read—update the
SQLAlchemy column definition to use DateTime(timezone=True) (so retrieval yields
aware datetimes) or, if changing schema is infeasible, ensure the code path that
produces/serializes creation_date (the model/serializer that fills new_study and
the DB write path) normalizes naive datetimes to UTC on read/write (e.g., attach
UTC tzinfo before serialization) so datetime.fromisoformat/new_study contain
timezone-aware values consistently.

assert new_study == expected


Expand Down