Add docstring linting (D-class rules) to ruff configuration by Nidhicodes · Pull Request #247 · openml/server-api

Nidhicodes · 2026-02-20T21:56:38Z

This PR addresses the issue of missing docstring linting in the OpenML server API #212 .

Changes:

Datetime Linting (DTZ):
- Fixed 6 violations where naive datetimes were used.
- Enabled the DTZ ruleset in ruff configuration.
Docstring Linting (D):
- Fixed minor formatting and style violations across the codebase.
- Added module docstrings to all __init__.py files to satisfy D104.
- Enabled the D ruleset in ruff configuration.
- Grandfathering Strategy: To avoid a massive effort of writing docstrings for 33 legacy files, I added per-file-ignores for D100, D101, D102, and D103 for those specific files. This ensures all new files and code must include docstrings, while legacy code can be updated gradually.
- Maintained existing project preferences for docstring style (D211/D212).
- Kept tests exempted from docstring rules as per standard practice.

Verification:

All changes verified with ruff check . and they pass.
Python 3.12 UTC constant was used for datetime awareness.

coderabbitai · 2026-02-20T21:56:57Z

Walkthrough

This pull request updates the project's linting configuration and documentation standards. The Ruff linter configuration removes global ignore rules for docstring (D) and timezone-aware datetime (DTZ) checks, replacing them with targeted per-file ignores across source and test modules. The PR systematically adds missing module-level docstrings to multiple files in the core, database, routers, and schemas packages. Additionally, docstring formatting and punctuation are improved for consistency throughout the codebase. Datetime handling is updated to use UTC-aware timestamps (datetime.UTC) in database and schema modules, and a new DatasetSchema enum is introduced to expose schema type options (DCAT_AP, OPENML, MLDCAT_AP).

Possibly related issues

Add docstring linting (D-class rules) to ruff configuration #212: Directly shares the same pyproject.toml Ruff configuration changes and accompanying docstring improvements

Possibly related PRs

Update pre-commit hooks #219: Modifies the project's linting configuration settings in pyproject.toml

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: enabling docstring linting (D-class rules) in the ruff configuration, which is the primary focus of the PR.
Description check	✅ Passed	The description is directly related to the changeset, detailing the docstring and datetime linting improvements, the grandfathering strategy for legacy files, and verification steps performed.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've found 3 issues, and left some high level feedback:

The per-file D100–D103 ignores in pyproject.toml are very repetitive; consider collapsing them using directory or glob patterns (e.g., src/database/*.py, src/routers/openml/*.py) to make the configuration shorter and easier to maintain.
Timezone usage is a bit mixed (datetime.datetime.now(datetime.UTC) vs datetime.now(UTC)); consider standardizing on one style and import pattern across the codebase for consistency and readability.
Some updated docstrings, such as the one on Agent in mldcat_ap.py, now have a summary line that reads as a fragment followed by a new paragraph; consider rephrasing so the first line is a single complete summary sentence per PEP 257.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The per-file `D100`–`D103` ignores in `pyproject.toml` are very repetitive; consider collapsing them using directory or glob patterns (e.g., `src/database/*.py`, `src/routers/openml/*.py`) to make the configuration shorter and easier to maintain.
- Timezone usage is a bit mixed (`datetime.datetime.now(datetime.UTC)` vs `datetime.now(UTC)`); consider standardizing on one style and import pattern across the codebase for consistency and readability.
- Some updated docstrings, such as the one on `Agent` in `mldcat_ap.py`, now have a summary line that reads as a fragment followed by a new paragraph; consider rephrasing so the first line is a single complete summary sentence per PEP 257.

## Individual Comments

### Comment 1
<location> `src/schemas/datasets/mldcat_ap.py:69-74` </location>
<code_context>


 class Agent(JsonLDObject):
-    """Any entity carrying out actions with respect to the (Core) entities Catalogue,
-    Datasets, Data Services and Distributions. If the Agent is an organisation,
-    the use of the Organization Ontology is recommended.
+    """Any entity carrying out actions with respect to the (Core) entities.
+
+    Catalogue, Datasets, Data Services and Distributions. If the Agent is an
+    organisation, the use of the Organization Ontology is recommended.
     """
</code_context>

<issue_to_address>
**nitpick (typo):** Agent docstring reads as two separate sentences where the second is a fragment.

Consider reflowing to something like: `Any entity carrying out actions with respect to the (Core) entities Catalogue, Datasets, Data Services and Distributions.` to keep the description grammatically consistent while still incorporating the additional detail.

```suggestion
class Agent(JsonLDObject):
+    """Any entity carrying out actions with respect to the (Core) entities Catalogue,
+    Datasets, Data Services and Distributions. If the Agent is an organisation, the
+    use of the Organization Ontology is recommended.
+    """
```
</issue_to_address>

### Comment 2
<location> `src/core/conversions.py:15-19` </location>
<code_context>


 def nested_str_to_num(obj: Any) -> Any:
-    """Recursively tries to convert all strings in the object to numbers.
-    For dictionaries, only the values will be converted."""
+    """Recursively try to convert all strings in the object to numbers.
+
+    For dictionaries, only the values will be converted.
+    """
     if isinstance(obj, str):
</code_context>

<issue_to_address>
**suggestion:** Docstring for nested_str_to_num under-describes behavior for non-dict containers.

Please also document how non-dict containers are handled (e.g., that lists/tuples/sets are fully traversed and all string elements are converted where possible) so the helper’s behavior is clear at call sites.

```suggestion
def nested_str_to_num(obj: Any) -> Any:
    """Recursively try to convert all strings in the object to numbers.

    For dictionaries, only the values will be converted.

    For non-dict containers (e.g., lists, tuples, sets), all elements are
    recursively traversed and any string elements are converted to numbers
    where possible, preserving the original container type and structure.
    """
```
</issue_to_address>

### Comment 3
<location> `tests/routers/openml/study_test.py:497-498` </location>
<code_context>
-        "%Y-%m-%dT%H:%M:%S",
-    )
-    assert creation_date.date() == datetime.now().date()
+    creation_date = datetime.fromisoformat(new_study.pop("creation_date"))
+    assert creation_date.date() == datetime.now(UTC).date()
     assert new_study == expected

</code_context>

<issue_to_address>
**suggestion (testing):** Add an explicit assertion that `creation_date` is timezone-aware and in UTC to prove the DTZ-related change.

Since this change is about enforcing timezone-aware datetimes, this regression test should also assert that the returned value is explicitly timezone-aware and in UTC, not just that the date matches. For example:

```python
creation_date = datetime.fromisoformat(new_study.pop("creation_date"))
assert creation_date.tzinfo is UTC
assert creation_date.date() == datetime.now(UTC).date()
```

This ensures the API contract (UTC-aware datetime) is verified, rather than only the calendar date.

Suggested implementation:

```python
from datetime import datetime, UTC

import httpx
    }
    new_study = study.json()

```

```python
    new_study = study.json()

    creation_date = datetime.fromisoformat(new_study.pop("creation_date"))
    assert creation_date.tzinfo is UTC
    assert creation_date.date() == datetime.now(UTC).date()

```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-02-20T21:57:36Z

src/schemas/datasets/mldcat_ap.py

 class Agent(JsonLDObject):
-    """Any entity carrying out actions with respect to the (Core) entities Catalogue,
-    Datasets, Data Services and Distributions. If the Agent is an organisation,
-    the use of the Organization Ontology is recommended.
+    """Any entity carrying out actions with respect to the (Core) entities.
+
+    Catalogue, Datasets, Data Services and Distributions. If the Agent is an
+    organisation, the use of the Organization Ontology is recommended.
    """


nitpick (typo): Agent docstring reads as two separate sentences where the second is a fragment.

Consider reflowing to something like: Any entity carrying out actions with respect to the (Core) entities Catalogue, Datasets, Data Services and Distributions. to keep the description grammatically consistent while still incorporating the additional detail.

Suggested change

class Agent(JsonLDObject):

"""Any entity carrying out actions with respect to the (Core) entities Catalogue,

Datasets, Data Services and Distributions. If the Agent is an organisation,

the use of the Organization Ontology is recommended.

"""Any entity carrying out actions with respect to the (Core) entities.

Catalogue, Datasets, Data Services and Distributions. If the Agent is an

organisation, the use of the Organization Ontology is recommended.

"""

class Agent(JsonLDObject):

+ """Any entity carrying out actions with respect to the (Core) entities Catalogue,

+ Datasets, Data Services and Distributions. If the Agent is an organisation, the

+ use of the Organization Ontology is recommended.

+ """

sourcery-ai · 2026-02-20T21:57:36Z

src/core/conversions.py

 def nested_str_to_num(obj: Any) -> Any:
-    """Recursively tries to convert all strings in the object to numbers.
-    For dictionaries, only the values will be converted."""
+    """Recursively try to convert all strings in the object to numbers.
+
+    For dictionaries, only the values will be converted.
+    """


suggestion: Docstring for nested_str_to_num under-describes behavior for non-dict containers.

Please also document how non-dict containers are handled (e.g., that lists/tuples/sets are fully traversed and all string elements are converted where possible) so the helper’s behavior is clear at call sites.

Suggested change

def nested_str_to_num(obj: Any) -> Any:

"""Recursively tries to convert all strings in the object to numbers.

For dictionaries, only the values will be converted."""

"""Recursively try to convert all strings in the object to numbers.

For dictionaries, only the values will be converted.

"""

def nested_str_to_num(obj: Any) -> Any:

"""Recursively try to convert all strings in the object to numbers.

For dictionaries, only the values will be converted.

For non-dict containers (e.g., lists, tuples, sets), all elements are

recursively traversed and any string elements are converted to numbers

where possible, preserving the original container type and structure.

"""

sourcery-ai · 2026-02-20T21:57:36Z

tests/routers/openml/study_test.py

+    creation_date = datetime.fromisoformat(new_study.pop("creation_date"))
+    assert creation_date.date() == datetime.now(UTC).date()


suggestion (testing): Add an explicit assertion that creation_date is timezone-aware and in UTC to prove the DTZ-related change.

Since this change is about enforcing timezone-aware datetimes, this regression test should also assert that the returned value is explicitly timezone-aware and in UTC, not just that the date matches. For example:

creation_date = datetime.fromisoformat(new_study.pop("creation_date")) assert creation_date.tzinfo is UTC assert creation_date.date() == datetime.now(UTC).date()

This ensures the API contract (UTC-aware datetime) is verified, rather than only the calendar date.

Suggested implementation:

from datetime import datetime, UTC import httpx } new_study = study.json()

new_study = study.json() creation_date = datetime.fromisoformat(new_study.pop("creation_date")) assert creation_date.tzinfo is UTC assert creation_date.date() == datetime.now(UTC).date()

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pyproject.toml (1)

54-56: ⚠️ Potential issue | 🟡 Minor

Remove stale comment.

The comment says D and DTZ rules are "heavily violated — fix later" but this PR is the fix. Remove or update it to avoid misleading future maintainers.

🔧 Proposed fix

 [tool.ruff.lint]
-# The D (doc) and DTZ (datetime zone) lint classes current heavily violated - fix later
 select = ["ALL"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 54 - 56, The inline comment above the
[tool.ruff.lint] configuration in pyproject.toml referencing "The D (doc) and
DTZ (datetime zone) lint classes current heavily violated - fix later" is now
stale; remove or replace that comment so it no longer misleads maintainers. Edit
the pyproject.toml, locate the [tool.ruff.lint] block and delete the stale
comment line (or update it to reflect that D/DTZ issues were addressed), leaving
the select = ["ALL"] setting intact.

🧹 Nitpick comments (4)

src/schemas/datasets/__init__.py (1)

6-9: Consider adding a docstring to the new DatasetSchema enum.

DatasetSchema is new public API but is silently grandfathered from D101 by the file-level per-file-ignore. Since the PR's stated intent is to enforce docstrings on new code, a short docstring here would be consistent with that goal.
♻️ Suggested addition
 class DatasetSchema(StrEnum):
+    """Schema format options for dataset representations."""
+
     DCAT_AP = "dcat-ap"
     OPENML = "openml"
     MLDCAT_AP = "mldcat-ap"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/schemas/datasets/__init__.py` around lines 6 - 9, Add a short docstring
to the public enum class DatasetSchema explaining what the enum represents and
how its members (DCAT_AP, OPENML, MLDCAT_AP) are used (e.g., canonical
identifiers for supported dataset metadata/schema formats); update the class
DatasetSchema so the docstring sits directly under the class declaration to
satisfy D101 and document the public API.

src/database/datasets.py (1)

165-165: LGTM — UTC-aware timestamp.

The naive datetime.datetime.now() is correctly replaced with datetime.datetime.now(datetime.UTC).

For consistency, src/database/studies.py uses from datetime import UTC, datetime / datetime.now(UTC) while this file uses datetime.datetime.now(datetime.UTC). Consider aligning the import style across the codebase.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/database/datasets.py` at line 165, The timestamp usage is UTC-aware but
imports are inconsistent; update src/database/datasets.py to match the style
used in src/database/studies.py by importing UTC and datetime directly (e.g.,
"from datetime import UTC, datetime") and replace uses of
"datetime.datetime.now(datetime.UTC)" with "datetime.now(UTC)" so the module
consistently uses the same datetime symbols and call pattern (locate the current
usage in the dataset creation code where "date" is set).

src/schemas/datasets/mldcat_ap.py (2)

70-74: Agent docstring reformatting produces a dangling fragment.

The break after "entities." leaves "Catalogue, Datasets, Data Services and Distributions." as an incomplete sentence starting the second paragraph; the semantic link (originally a colon) is lost.

📝 Suggested fix

-    """Any entity carrying out actions with respect to the (Core) entities.
-
-    Catalogue, Datasets, Data Services and Distributions. If the Agent is an
-    organisation, the use of the Organization Ontology is recommended.
-    """
+    """Any entity carrying out actions with respect to the (Core) entities.
+
+    Applies to Catalogues, Datasets, Data Services, and Distributions.
+    If the Agent is an organisation, the use of the Organization Ontology is recommended.
+    """

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/schemas/datasets/mldcat_ap.py` around lines 70 - 74, The Agent
class/module docstring currently breaks a sentence causing a dangling fragment
("Catalogue, Datasets, Data Services and Distributions.") — update the Agent
docstring in src/schemas/datasets/mldcat_ap.py (the triple-quoted string
associated with Agent) to restore the original semantic link: either merge the
two lines into one sentence with the colon (e.g., "...entities: Catalogue,
Datasets, Data Services and Distributions.") or rephrase the second paragraph to
be a complete sentence that follows the first, ensuring punctuation and sentence
boundaries are correct.

70-74: Agent docstring split creates a dangling sentence fragment.

The reformatting replaced the original colon-linked list with a hard paragraph break, leaving "Catalogue, Datasets, Data Services and Distributions." as an orphaned opening clause with no grammatical connection to the first sentence.

📝 Suggested fix

-    """Any entity carrying out actions with respect to the (Core) entities.
-
-    Catalogue, Datasets, Data Services and Distributions. If the Agent is an
-    organisation, the use of the Organization Ontology is recommended.
-    """
+    """Any entity carrying out actions with respect to the (Core) entities.
+
+    Applies to Catalogues, Datasets, Data Services, and Distributions.
+    If the Agent is an organisation, the use of the Organization Ontology is recommended.
+    """

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/schemas/datasets/mldcat_ap.py` around lines 70 - 74, The Agent class
docstring in src/schemas/datasets/mldcat_ap.py was split leaving a fragment
("Catalogue, Datasets, Data Services and Distributions.") — update the docstring
for Agent to be a single coherent sentence or a properly punctuated list; e.g.,
merge the fragment into the first sentence after the colon or convert the
fragment into a continuation clause or bullet list so the meaning is preserved
and no dangling sentence remains (look for the Agent docstring string literal to
edit).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Line 68: Update the glob used to suppress rules for test files: replace the
non-recursive pattern "tests/*" with the recursive pattern "tests/**" in the
pyproject TOML setting that maps test globs to ["S101","COM812","D"] so nested
test files (e.g., tests/routers/... and tests/database/...) are properly matched
and the D rule is suppressed for all test subdirectories.

In `@src/schemas/datasets/mldcat_ap.py`:
- Line 22: The docstring on JsonLDQualifiedLiteral is incorrect — it was copied
from JsonLDObject; update the docstring for the JsonLDQualifiedLiteral class to
describe that it is a concrete JSON-LD literal model with fields type_ and value
(e.g., "Concrete JSON-LD literal with fields `type_` and `value`"), and ensure
JsonLDObject retains the "Base class for all JSON-LD objects." description so
the two classes have accurate, distinct docstrings.
- Line 22: The docstring on class JsonLDQualifiedLiteral is incorrect (it was
copied from JsonLDObject); update the JsonLDQualifiedLiteral docstring to
accurately describe that it is a concrete value model with two fields (type_ and
value) rather than a base class—mention the purpose/usage of the model and the
fields so API docs are not misleading, and ensure JsonLDObject remains
documented as the abstract base class for JSON-LD objects.

In `@tests/routers/openml/study_test.py`:
- Around line 497-498: The test reveals a timezone-awareness gap: creation_date
is stored as a naive DATETIME and returned naive via SQLAlchemy, then parsed
with datetime.fromisoformat in the test; fix by preserving UTC context either at
storage or on read—update the SQLAlchemy column definition to use
DateTime(timezone=True) (so retrieval yields aware datetimes) or, if changing
schema is infeasible, ensure the code path that produces/serializes
creation_date (the model/serializer that fills new_study and the DB write path)
normalizes naive datetimes to UTC on read/write (e.g., attach UTC tzinfo before
serialization) so datetime.fromisoformat/new_study contain timezone-aware values
consistently.

---

Outside diff comments:
In `@pyproject.toml`:
- Around line 54-56: The inline comment above the [tool.ruff.lint] configuration
in pyproject.toml referencing "The D (doc) and DTZ (datetime zone) lint classes
current heavily violated - fix later" is now stale; remove or replace that
comment so it no longer misleads maintainers. Edit the pyproject.toml, locate
the [tool.ruff.lint] block and delete the stale comment line (or update it to
reflect that D/DTZ issues were addressed), leaving the select = ["ALL"] setting
intact.

---

Nitpick comments:
In `@src/database/datasets.py`:
- Line 165: The timestamp usage is UTC-aware but imports are inconsistent;
update src/database/datasets.py to match the style used in
src/database/studies.py by importing UTC and datetime directly (e.g., "from
datetime import UTC, datetime") and replace uses of
"datetime.datetime.now(datetime.UTC)" with "datetime.now(UTC)" so the module
consistently uses the same datetime symbols and call pattern (locate the current
usage in the dataset creation code where "date" is set).

In `@src/schemas/datasets/__init__.py`:
- Around line 6-9: Add a short docstring to the public enum class DatasetSchema
explaining what the enum represents and how its members (DCAT_AP, OPENML,
MLDCAT_AP) are used (e.g., canonical identifiers for supported dataset
metadata/schema formats); update the class DatasetSchema so the docstring sits
directly under the class declaration to satisfy D101 and document the public
API.

In `@src/schemas/datasets/mldcat_ap.py`:
- Around line 70-74: The Agent class/module docstring currently breaks a
sentence causing a dangling fragment ("Catalogue, Datasets, Data Services and
Distributions.") — update the Agent docstring in
src/schemas/datasets/mldcat_ap.py (the triple-quoted string associated with
Agent) to restore the original semantic link: either merge the two lines into
one sentence with the colon (e.g., "...entities: Catalogue, Datasets, Data
Services and Distributions.") or rephrase the second paragraph to be a complete
sentence that follows the first, ensuring punctuation and sentence boundaries
are correct.
- Around line 70-74: The Agent class docstring in
src/schemas/datasets/mldcat_ap.py was split leaving a fragment ("Catalogue,
Datasets, Data Services and Distributions.") — update the docstring for Agent to
be a single coherent sentence or a properly punctuated list; e.g., merge the
fragment into the first sentence after the colon or convert the fragment into a
continuation clause or bullet list so the meaning is preserved and no dangling
sentence remains (look for the Agent docstring string literal to edit).

coderabbitai · 2026-02-20T22:05:15Z

pyproject.toml

 [tool.ruff.lint.per-file-ignores]
-"tests/*" = [ "S101", "COM812", "D"]
-"src/core/conversions.py" = ["ANN401"]
+"tests/*" = ["S101", "COM812", "D"]


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

fd --type f --extension py . tests/

Repository: openml/server-api

Length of output: 991

Change glob pattern to recursively match nested test files.

The pattern "tests/*" only matches files directly in the tests/ directory. However, the repository contains nested test files at multiple levels (e.g., tests/routers/openml/study_test.py, tests/database/flows_test.py, tests/routers/openml/migration/datasets_migration_test.py). The D (docstring) rules will incorrectly fire on these nested files instead of being suppressed. Use "tests/**" for recursive coverage.

🔧 Proposed fix

-"tests/*" = ["S101", "COM812", "D"] +"tests/**" = ["S101", "COM812", "D"]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"tests/*" = ["S101", "COM812", "D"]

"tests/**" = ["S101", "COM812", "D"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` at line 68, Update the glob used to suppress rules for test files: replace the non-recursive pattern "tests/*" with the recursive pattern "tests/**" in the pyproject TOML setting that maps test globs to ["S101","COM812","D"] so nested test files (e.g., tests/routers/... and tests/database/...) are properly matched and the D rule is suppressed for all test subdirectories.

coderabbitai · 2026-02-20T22:05:15Z

src/schemas/datasets/mldcat_ap.py


 class JsonLDQualifiedLiteral(BaseModel):
-    """Base class for all JSON-LD objects"""
+    """Base class for all JSON-LD objects."""


⚠️ Potential issue | 🟡 Minor

JsonLDQualifiedLiteral has the wrong docstring — copied from JsonLDObject.

JsonLDQualifiedLiteral is a concrete value model (two fields: type_ and value) and is not a base class. The docstring "Base class for all JSON-LD objects." belongs to JsonLDObject (the actual ABC at line 34), and appears to have been copy-pasted here.

📝 Proposed fix

- """Base class for all JSON-LD objects.""" + """A JSON-LD qualified literal pairing an explicit type with a string value."""

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"""Base class for all JSON-LD objects."""

"""A JSON-LD qualified literal pairing an explicit type with a string value."""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/schemas/datasets/mldcat_ap.py` at line 22, The docstring on JsonLDQualifiedLiteral is incorrect — it was copied from JsonLDObject; update the docstring for the JsonLDQualifiedLiteral class to describe that it is a concrete JSON-LD literal model with fields type_ and value (e.g., "Concrete JSON-LD literal with fields `type_` and `value`"), and ensure JsonLDObject retains the "Base class for all JSON-LD objects." description so the two classes have accurate, distinct docstrings.

⚠️ Potential issue | 🟡 Minor

Incorrect docstring copy-pasted from JsonLDObject.

JsonLDQualifiedLiteral is a concrete value model (two fields: type_ and value); it is not a base class. The docstring "Base class for all JSON-LD objects." was lifted verbatim from JsonLDObject (line 34), which is the actual ABC. This will mislead any reader of the API docs.

📝 Proposed fix

- """Base class for all JSON-LD objects.""" + """A JSON-LD qualified literal with an explicit type and value."""

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"""Base class for all JSON-LD objects."""

"""A JSON-LD qualified literal with an explicit type and value."""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/schemas/datasets/mldcat_ap.py` at line 22, The docstring on class JsonLDQualifiedLiteral is incorrect (it was copied from JsonLDObject); update the JsonLDQualifiedLiteral docstring to accurately describe that it is a concrete value model with two fields (type_ and value) rather than a base class—mention the purpose/usage of the model and the fields so API docs are not misleading, and ensure JsonLDObject remains documented as the abstract base class for JSON-LD objects.

coderabbitai · 2026-02-20T22:05:15Z

tests/routers/openml/study_test.py

+    creation_date = datetime.fromisoformat(new_study.pop("creation_date"))
+    assert creation_date.date() == datetime.now(UTC).date()


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n tests/routers/openml/study_test.py | sed -n '490,510p'

Repository: openml/server-api

Length of output: 795

🏁 Script executed:

# Find Pydantic models related to Study rg -t py "class.*Study" --max-count=20

Repository: openml/server-api

Length of output: 252

🏁 Script executed:

# Look for any custom datetime serializers or Pydantic config rg -t py "json_encoders|ConfigDict|model_config|datetime" tests/routers/openml/study_test.py -A 2 -B 2

Repository: openml/server-api

Length of output: 326

🌐 Web query:

Pydantic v2 default datetime serialization with timezone UTC

💡 Result:

In Pydantic v2, datetime values are serialized to JSON as strings, and the default temporal JSON format is ISO 8601 (ser_json_temporal='iso8601'). It does not automatically convert datetimes to UTC; whatever tzinfo is on the value is what gets serialized (as an offset like +02:30, +00:00, etc.). [1][2]

What you get by default (UTC-aware)

If your value is already UTC-aware (tzinfo=timezone.utc), the default JSON output will be ISO-8601 with an offset (typically +00:00). [2]

Enforce “UTC (and optionally Z)” on output

If you want to always output UTC, convert to UTC in a serializer; if you specifically want a trailing Z, format it yourself:

from datetime import datetime, timezone from pydantic import BaseModel, field_serializer class M(BaseModel): dt: datetime `@field_serializer`('dt') def ser_dt(self, v: datetime, _info): v = v.astimezone(timezone.utc) return v.strftime('%Y-%m-%dT%H:%M:%S.%fZ').rstrip('0').replace('.Z', 'Z')

Pydantic v2 supports this via @field_serializer (and other serializer hooks). [3]
(Example pattern for Z formatting in v2 is also commonly done with PlainSerializer/Annotated.) [4]

Sources:
[1] Pydantic config: ser_json_temporal default and options. (docs.pydantic.dev)
[2] Pydantic standard library types: datetime JSON serialization and example with timezone offset. (docs.pydantic.dev)
[3] Pydantic serialization docs: custom serializers (@field_serializer, etc.). (docs.pydantic.dev)
[4] Example approach using v2 Annotated + PlainSerializer to emit Z. (stackoverflow.com)

Citations:

1: https://docs.pydantic.dev/dev/api/config/

2: https://docs.pydantic.dev/latest/api/standard_library_types/

3: https://docs.pydantic.dev/2.0/usage/serialization/

4: https://stackoverflow.com/questions/66548586/how-to-change-date-format-in-pydantic?utm_source=openai

🏁 Script executed:

cat -n src/schemas/study.py | head -100

Repository: openml/server-api

Length of output: 3708

🏁 Script executed:

# Find the creation_date field in Study model rg -t py "creation_date" src/schemas/study.py -B 5 -A 2

Repository: openml/server-api

Length of output: 255

🏁 Script executed:

# Check where creation_date is set in the codebase rg -t py "creation_date" --max-count=30 -B 3 -A 3

Repository: openml/server-api

Length of output: 3908

🏁 Script executed:

# Look for the creation of Study objects, particularly in routers or models rg -t py "Study\(" src/ -B 2 -A 5 | head -100

Repository: openml/server-api

Length of output: 1375

🏁 Script executed:

# Check if there are any custom serializers or model config in Study cat -n src/schemas/study.py

Repository: openml/server-api

Length of output: 3708

🏁 Script executed:

# Check if there are any custom serializers in the Study model or BaseModel config rg -t py "field_serializer|ConfigDict|model_config" src/schemas/study.py

Repository: openml/server-api

Length of output: 43

🏁 Script executed:

# Check the full router code to see how Study is instantiated cat -n src/routers/openml/study.py | grep -A 20 "return Study("

Repository: openml/server-api

Length of output: 928

🏁 Script executed:

# Verify the imports and check if UTC is being used consistently cat -n src/database/studies.py | head -50

Repository: openml/server-api

Length of output: 1782

🏁 Script executed:

# Check if there are any custom field serializers in any base model or parent rg -t py "field_serializer" src/schemas/

Repository: openml/server-api

Length of output: 313

🏁 Script executed:

# Look at how the creation_date data flows - check the test more closely cat -n tests/routers/openml/study_test.py | sed -n '470,510p'

Repository: openml/server-api

Length of output: 1516

🏁 Script executed:

# Check if there's a BaseModel with custom datetime serialization cat -n src/schemas/core.py

Repository: openml/server-api

Length of output: 203

🏁 Script executed:

# Search for any Pydantic config that might affect serialization rg -t py "class BaseModel|ConfigDict|model_config|ser_json_temporal" src/schemas/

Repository: openml/server-api

Length of output: 846

🏁 Script executed:

# Let me check the full test to understand the test data better cat -n tests/routers/openml/study_test.py | sed -n '450,480p'

Repository: openml/server-api

Length of output: 1096

🏁 Script executed:

# Check if there's a custom datetime serializer in the fixtures or anywhere rg -t py "json_encoders|serialize.*datetime" tests/

Repository: openml/server-api

Length of output: 43

🏁 Script executed:

# Check how the datetime is actually being serialized - look at actual test execution context cat -n tests/routers/openml/study_test.py | sed -n '1,50p'

Repository: openml/server-api

Length of output: 1516

🏁 Script executed:

# Check if Study model uses any parent class or has config we missed rg -t py "class Study" src/schemas/ -A 20

Repository: openml/server-api

Length of output: 1454

🏁 Script executed:

# Check what the actual datetime format looks like in the test fixture database rg -t sql "2019-02-25" tests/ src/

Repository: openml/server-api

Length of output: 43

🏁 Script executed:

# Look for any datetime handling in the conftest or test setup find tests -name "conftest.py" -o -name "fixtures*" | head -5

Repository: openml/server-api

Length of output: 78

🏁 Script executed:

# Check if there are database snapshots or fixtures with actual creation_date values find tests -type f -name "*.py" -path "*fixture*" -o -name "conftest.py" | xargs grep -l "creation_date" 2>/dev/null

Repository: openml/server-api

Length of output: 45

🏁 Script executed:

# Let's check the test database setup to understand the actual stored values cat -n tests/conftest.py | head -100

Repository: openml/server-api

Length of output: 3562

🏁 Script executed:

# Check how the Study model is being converted to JSON in the test # Look for any custom JSON encoding or serialization rg -t py "\.json()" tests/routers/openml/study_test.py -B 3 -A 3

Repository: openml/server-api

Length of output: 1507

🏁 Script executed:

# Check if TestClient.json() method behaves differently # Let's look at actual test database to see what creation_date values exist cat -n tests/resources/sql/ 2>/dev/null | head -20 || find tests/resources -name "*.sql" -o -name "*.json" | head -10

Repository: openml/server-api

Length of output: 43

🏁 Script executed:

# Check if there's database initialization with actual datetime values find tests/resources -type f | head -20

Repository: openml/server-api

Length of output: 102

🏁 Script executed:

# Look for any database migration or initialization scripts rg -t sql "creation_date" tests/ src/database/

Repository: openml/server-api

Length of output: 43

🏁 Script executed:

# Let me check the actual database schema for the study table rg -t sql "CREATE TABLE.*study" src/database/

Repository: openml/server-api

Length of output: 43

🏁 Script executed:

# Check how the datetime is retrieved from the database cat -n src/database/studies.py | sed -n '1,100p'

Repository: openml/server-api

Length of output: 3834

🏁 Script executed:

# Let's trace the actual data: check what the test fixture shows cat -n tests/routers/openml/study_test.py | sed -n '10,80p'

Repository: openml/server-api

Length of output: 1914

🏁 Script executed:

# Check if there's a custom datetime deserializer or handler in the routers cat -n src/routers/openml/study.py | head -50

Repository: openml/server-api

Length of output: 2186

🏁 Script executed:

# Check the actual type of creation_date coming from the database # Look at the database schema definition find src -name "*.py" -type f | xargs grep -l "CREATE TABLE\|alembic\|migration" | head -5

Repository: openml/server-api

Length of output: 43

🏁 Script executed:

# Check if SQLAlchemy has any type conversion for datetime rg -t py "DateTime|DATETIME" src/database/ -B 2 -A 2

Repository: openml/server-api

Length of output: 43

🏁 Script executed:

# Let's look at what the test database actually contains cat -n tests/resources/sql/ 2>/dev/null || find tests -name "*.sql" -type f | head -10

Repository: openml/server-api

Length of output: 43

🏁 Script executed:

# Check the database initialization/fixture setup rg -t py "INSERT INTO study" tests/ src/database/ -B 3 -A 3

Repository: openml/server-api

Length of output: 447

🏁 Script executed:

# Critical check: let's see the actual test expectation and compare # If the test expects "2019-02-25T17:15:01" without offset, but Pydantic defaults to adding it, # then the first test (line 457) would fail. Let's check if that test actually passes cat -n tests/routers/openml/study_test.py | sed -n '12,60p'

Repository: openml/server-api

Length of output: 1438

🏁 Script executed:

# Check if there's any datetime handling or type conversion happening # Look at how SQLAlchemy is configured or if there's custom type handling rg -t py "python_type|bind_processor|result_processor" src/

Repository: openml/server-api

Length of output: 43

🏁 Script executed:

# Let me check the actual database column definition for creation_date # Look for schema or migration files find src -name "*.toml" -o -name "*.yaml" -o -name "*.yml" | head -10

Repository: openml/server-api

Length of output: 76

🏁 Script executed:

# Check if there's a type stub or database schema definition rg -t py "datetime|tzinfo" src/database/studies.py -B 1 -A 1 | head -30

Repository: openml/server-api

Length of output: 298

🏁 Script executed:

# Critical: Check what the database actually stores and retrieves # Look at how SQLAlchemy would interpret the datetime from the database cat -n src/database/setup.py 2>/dev/null || find src -name "setup.py" -o -name "database.py" | head -5

Repository: openml/server-api

Length of output: 1145

🏁 Script executed:

# Check the database connection setup rg -t py "create_engine\|Engine" src/database/ -B 2 -A 5

Repository: openml/server-api

Length of output: 43

🏁 Script executed:

# Let me verify: when creation_date is stored as datetime.now(UTC) in the database, # what type does SQLAlchemy retrieve it as? Check the conftest to see actual test cat -n tests/routers/openml/study_test.py | sed -n '495,500p'

Repository: openml/server-api

Length of output: 301

🏁 Script executed:

# The key question: does the test database fixture have timezone info? # Let's check what's in the database by looking for where fixtures are populated rg -t py "datetime\.now\(UTC\)|timezone" tests/routers/openml/study_test.py

Repository: openml/server-api

Length of output: 120

🏁 Script executed:

# Key question: Is the creation_date coming from DB naive or aware? # When SQLAlchemy retrieves datetime from MySQL DATETIME column, it's naive # Let's check if there's any tzinfo conversion happening rg -t py "tzinfo\|replace\(tzinfo\|astimezone" src/routers/openml/study.py src/database/studies.py

Repository: openml/server-api

Length of output: 43

🏁 Script executed:

# Let's check: when the test calls study.json(), what does Pydantic actually produce? # We need to understand if the database value is naive or aware when passed to Study cat -n src/routers/openml/study.py | sed -n '130,155p'

Repository: openml/server-api

Length of output: 1164

🏁 Script executed:

# Check: Is there any custom serializer or configuration on the Study model? rg -t py "model_config\|ConfigDict\|json_encoders" src/schemas/study.py

Repository: openml/server-api

Length of output: 43

🏁 Script executed:

# The critical piece: let's see if datetime.now(UTC) actually produces an aware datetime # and check if Pydantic would serialize it differently python3 << 'EOF' from datetime import UTC, datetime from pydantic import BaseModel class TestModel(BaseModel): dt: datetime # Test with aware datetime aware_dt = datetime.now(UTC) m_aware = TestModel(dt=aware_dt) print("Aware datetime JSON:", m_aware.model_dump_json()) # Test with naive datetime naive_dt = datetime.now() m_naive = TestModel(dt=naive_dt) print("Naive datetime JSON:", m_naive.model_dump_json()) EOF

Repository: openml/server-api

Length of output: 175

The test is safe from flakiness but has a subtle semantic issue with timezone awareness.

Pydantic v2 does serialize timezone-aware datetimes with a timezone indicator (e.g., Z for UTC). However, the actual issue here is upstream: creation_date is stored in the database as a naive DATETIME column. When SQLAlchemy retrieves it, it returns a naive Python datetime object, so Pydantic serializes it without any timezone offset (e.g., "2019-02-25T17:15:01").

The test call creation_date.date() == datetime.now(UTC).date() is safe because .date() strips timezone information from both sides, resulting in a comparison of two naive date objects. This won't be flaky even at midnight UTC.

However, the underlying code has a semantic gap: datetime.now(UTC) (an aware datetime) is written to a naive database column, losing its UTC context. While the test works correctly by comparing dates, consider whether creation_date should preserve timezone information throughout the stack—either by using a timezone-aware database column or by explicitly managing UTC context.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/study_test.py` around lines 497 - 498, The test reveals a timezone-awareness gap: creation_date is stored as a naive DATETIME and returned naive via SQLAlchemy, then parsed with datetime.fromisoformat in the test; fix by preserving UTC context either at storage or on read—update the SQLAlchemy column definition to use DateTime(timezone=True) (so retrieval yields aware datetimes) or, if changing schema is infeasible, ensure the code path that produces/serializes creation_date (the model/serializer that fills new_study and the DB write path) normalizes naive datetimes to UTC on read/write (e.g., attach UTC tzinfo before serialization) so datetime.fromisoformat/new_study contain timezone-aware values consistently.

enable docstring linting

1a2222d

sourcery-ai bot reviewed Feb 20, 2026

View reviewed changes

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Add docstring linting (D-class rules) to ruff configuration#247

Add docstring linting (D-class rules) to ruff configuration#247
Nidhicodes wants to merge 1 commit intoopenml:mainfrom
Nidhicodes:docstring-linting

Nidhicodes commented Feb 20, 2026

Uh oh!

coderabbitai bot commented Feb 20, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Feb 20, 2026

Uh oh!

sourcery-ai bot Feb 20, 2026

Uh oh!

sourcery-ai bot Feb 20, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 20, 2026

Uh oh!

coderabbitai bot Feb 20, 2026

Uh oh!

coderabbitai bot Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		creation_date = datetime.fromisoformat(new_study.pop("creation_date"))
		assert creation_date.date() == datetime.now(UTC).date()

	"tests/*" = ["S101", "COM812", "D"]
	"tests/**" = ["S101", "COM812", "D"]

	"""Base class for all JSON-LD objects."""
	"""A JSON-LD qualified literal pairing an explicit type with a string value."""

	"""Base class for all JSON-LD objects."""
	"""A JSON-LD qualified literal with an explicit type and value."""

Uh oh!

Comments

Conversation

Nidhicodes commented Feb 20, 2026

Changes:

Verification:

Uh oh!

coderabbitai bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Possibly related issues

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

What you get by default (UTC-aware)

Enforce “UTC (and optionally Z)” on output

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Feb 20, 2026 •

edited

Loading

Enforce “UTC (and optionally `Z`)” on output