Skip to content

feat: add Python SDK for model spec types and validation#175

Open
pradhyum6144 wants to merge 3 commits intomodelpack:mainfrom
pradhyum6144:feat/python-sdk
Open

feat: add Python SDK for model spec types and validation#175
pradhyum6144 wants to merge 3 commits intomodelpack:mainfrom
pradhyum6144:feat/python-sdk

Conversation

@pradhyum6144
Copy link
Contributor

@pradhyum6144 pradhyum6144 commented Feb 28, 2026

Summary

Adds a Python SDK for the CNCF ModelPack specification, addressing issue #138. Uses dataclasses mirroring Go structs in specs-go/v1/ with JSON serialization matching Go's camelCase tags, omitempty behavior, and RFC 3339 timestamp format exactly. Loads config-schema.json from the repo root as the single source of truth (no copy), validates using Draft4Validator to match the schema's draft-04 declaration. Uses pyproject.toml for modern packaging.

Test plan

Closes #138

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new Python SDK for the CNCF ModelPack specification, providing a robust and consistent way to define, serialize, and validate AI model configurations in Python. By mirroring the existing Go SDK's type definitions and utilizing the same JSON schema for validation, this SDK ensures interoperability and a unified experience for developers working with ModelPack across different language ecosystems.

Highlights

  • New Python SDK: Introduced a new Python SDK (specs-python/) to mirror the existing Go SDK (specs-go/v1/) for ModelPack specifications.
  • Core Type Implementation: Implemented Python dataclasses for core ModelPack types: Model, ModelDescriptor, ModelConfig, ModelCapabilities, ModelFS, and Modality.
  • JSON Serialization Consistency: Ensured JSON serialization and deserialization for these dataclasses are fully compatible with the Go SDK's JSON tags.
  • Unified Validation: Integrated a validation mechanism that utilizes the same config-schema.json as the Go SDK, guaranteeing consistent validation rules across languages.
  • Constants Defined: Defined all necessary media type and annotation constants within the Python SDK.
  • Comprehensive Testing: Added a comprehensive test suite with 64 tests, covering serialization round-trips, Go validation test cases, and various edge cases.
Changelog
  • .gitignore
    • Added Python-specific ignore patterns for build artifacts and environment files.
  • specs-python/modelpack/v1/init.py
    • Created the package initialization file, exposing core ModelPack types, constants, and the validation function.
  • specs-python/modelpack/v1/annotations.py
    • Defined annotation constants and the FileMetadata dataclass, mirroring the Go SDK's annotations.go.
  • specs-python/modelpack/v1/config.py
    • Implemented dataclasses for Model, ModelCapabilities, ModelConfig, ModelDescriptor, ModelFS, and Modality, including to_dict, from_dict, to_json, and from_json methods for JSON serialization/deserialization.
  • specs-python/modelpack/v1/mediatype.py
    • Defined various media type constants for ModelPack artifacts, consistent with mediatype.go.
  • specs-python/modelpack/v1/validator.py
    • Added a JSON schema validator for model configurations, leveraging the shared config-schema.json to ensure cross-language consistency.
  • specs-python/setup.py
    • Created the setup.py file for packaging the Python SDK, specifying dependencies and development requirements.
  • specs-python/tests/test_annotations.py
    • Added unit tests for FileMetadata serialization and annotation constants.
  • specs-python/tests/test_config.py
    • Added extensive unit tests for ModelPack configuration types, covering serialization, deserialization, and various data structures.
  • specs-python/tests/test_mediatype.py
    • Added unit tests to verify the correctness of media type constants.
  • specs-python/tests/test_validator.py
    • Added comprehensive validation tests, including both valid and invalid configuration scenarios, mirroring the Go SDK's validation test cases.
Activity
  • The author, pradhyum6144, created the pull request to introduce the Python SDK.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a well-structured Python SDK for ModelPack specifications, mirroring the existing Go implementation. The use of dataclasses for model types and a shared JSON schema for validation are excellent choices for ensuring consistency across languages. However, I've identified a couple of critical issues. Firstly, there's a compatibility problem with Python 3.10 regarding the parsing of RFC3339 timestamps with a 'Z' suffix, which will cause errors. Secondly, the method for loading the JSON schema file is not robust for a distributable package and will fail upon installation. My review includes specific suggestions to address these critical issues to ensure the SDK is both correct and packageable.

Comment on lines +65 to +66
if "mtime" in data:
mod_time = datetime.fromisoformat(data["mtime"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

datetime.fromisoformat() does not support the 'Z' suffix for UTC timezones in Python versions before 3.11. Since your setup.py supports Python 3.10, this will cause a ValueError for valid RFC3339 timestamps ending in 'Z'. You can fix this by replacing 'Z' with '+00:00' before parsing.

Suggested change
if "mtime" in data:
mod_time = datetime.fromisoformat(data["mtime"])
if "mtime" in data and data["mtime"]:
mod_time = datetime.fromisoformat(data["mtime"].replace("Z", "+00:00"))

Comment on lines +82 to +83
if "knowledgeCutoff" in data:
kwargs["knowledge_cutoff"] = datetime.fromisoformat(data["knowledgeCutoff"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

datetime.fromisoformat() does not support the 'Z' suffix for UTC timezones in Python versions before 3.11. Since your setup.py supports Python 3.10, this will cause a ValueError for valid RFC3339 timestamps ending in 'Z'. You can fix this by replacing 'Z' with '+00:00' before parsing.

Suggested change
if "knowledgeCutoff" in data:
kwargs["knowledge_cutoff"] = datetime.fromisoformat(data["knowledgeCutoff"])
if "knowledgeCutoff" in data and data["knowledgeCutoff"]:
kwargs["knowledge_cutoff"] = datetime.fromisoformat(data["knowledgeCutoff"].replace("Z", "+00:00"))

Comment on lines +224 to +225
if "createdAt" in data:
created_at = datetime.fromisoformat(data["createdAt"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

datetime.fromisoformat() does not support the 'Z' suffix for UTC timezones in Python versions before 3.11. Since your setup.py supports Python 3.10, this will cause a ValueError for valid RFC3339 timestamps ending in 'Z'. You can fix this by replacing 'Z' with '+00:00' before parsing.

Suggested change
if "createdAt" in data:
created_at = datetime.fromisoformat(data["createdAt"])
if "createdAt" in data and data["createdAt"]:
created_at = datetime.fromisoformat(data["createdAt"].replace("Z", "+00:00"))

Comment on lines +33 to +36
def _load_schema() -> dict:
"""Load and return the config JSON schema."""
with open(_CONFIG_SCHEMA_PATH) as f:
return json.load(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current method of locating config-schema.json via a hardcoded relative path is not robust for packaging and will fail when the package is installed via pip. The schema file should be included as package data and accessed using importlib.resources.

To fix this, you should:

  1. Move schema/config-schema.json into specs-python/modelpack/v1/.
  2. Update setup.py to include this file (I have left a separate comment there).
  3. Update this function to load the schema as a resource. You will also need to add from importlib import resources at the top of the file and remove the global _SCHEMA_DIR and _CONFIG_SCHEMA_PATH variables.
Suggested change
def _load_schema() -> dict:
"""Load and return the config JSON schema."""
with open(_CONFIG_SCHEMA_PATH) as f:
return json.load(f)
def _load_schema() -> dict:
"""Load and return the config JSON schema."""
import importlib.resources
schema_file = importlib.resources.files("modelpack.v1").joinpath("config-schema.json")
with schema_file.open(encoding="utf-8") as f:
return json.load(f)

Comment on lines +17 to +32
setup(
name="modelpack",
version="0.1.0",
description="Python SDK for the CNCF ModelPack specification",
packages=find_packages(),
python_requires=">=3.10",
install_requires=[
"jsonschema[format]>=4.20.0",
],
extras_require={
"dev": [
"pytest>=7.0",
"ruff>=0.4.0",
],
},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

To fix the packaging issue I mentioned in validator.py, you need to tell setuptools to include config-schema.json in the package. After moving the file to specs-python/modelpack/v1/, you can use the package_data argument in setup().

Suggested change
setup(
name="modelpack",
version="0.1.0",
description="Python SDK for the CNCF ModelPack specification",
packages=find_packages(),
python_requires=">=3.10",
install_requires=[
"jsonschema[format]>=4.20.0",
],
extras_require={
"dev": [
"pytest>=7.0",
"ruff>=0.4.0",
],
},
)
setup(
name="modelpack",
version="0.1.0",
description="Python SDK for the CNCF ModelPack specification",
packages=find_packages(),
package_data={"modelpack.v1": ["config-schema.json"]},
python_requires=">=3.10",
install_requires=[
"jsonschema[format]>=4.20.0",
],
extras_require={
"dev": [
"pytest>=7.0",
"ruff>=0.4.0",
],
},
)

@aftersnow
Copy link
Contributor

Thanks for working on this — I like the direction, but I have three concerns before this feels ready to merge.

  1. specs-python/modelpack/v1/config-schema.json is a copy of schema/config-schema.json, not a single source of truth, and specs-python/modelpack/v1/validator.py validates it with Draft202012Validator even though the schema file still declares draft-04. That creates both schema drift risk and a validator/dialect mismatch.

  2. This PR adds a full new Python SDK, but the repo CI does not appear to run Python tests, packaging/install smoke tests, or a schema sync check. Without that, this will be difficult to maintain safely over time.

  3. The PR description says the Python SDK matches the Go JSON tags exactly, but that does not seem true yet. In particular, some omitempty-style behavior and some timestamp/default-value serialization in specs-python/modelpack/v1/config.py and specs-python/modelpack/v1/annotations.py do not fully mirror the current Go types.

Suggestions:

  • Since this is a fairly large repo-level addition, it would be helpful to link an issue or design discussion that covers ownership, CI, and release strategy.
  • If this is intended to be published as a Python package, it may also be worth considering a pyproject.toml-based setup instead of only setup.py.

@aftersnow aftersnow added the enhancement New feature or request label Mar 20, 2026
@pradhyum6144
Copy link
Contributor Author

pradhyum6144 commented Mar 20, 2026

Thanks for working on this — I like the direction, but I have three concerns before this feels ready to merge.

1. `specs-python/modelpack/v1/config-schema.json` is a copy of `schema/config-schema.json`, not a single source of truth, and `specs-python/modelpack/v1/validator.py` validates it with `Draft202012Validator` even though the schema file still declares draft-04. That creates both schema drift risk and a validator/dialect mismatch.

2. This PR adds a full new Python SDK, but the repo CI does not appear to run Python tests, packaging/install smoke tests, or a schema sync check. Without that, this will be difficult to maintain safely over time.

3. The PR description says the Python SDK matches the Go JSON tags exactly, but that does not seem true yet. In particular, some `omitempty`-style behavior and some timestamp/default-value serialization in `specs-python/modelpack/v1/config.py` and `specs-python/modelpack/v1/annotations.py` do not fully mirror the current Go types.

Suggestions:

* Since this is a fairly large repo-level addition, it would be helpful to link an issue or design discussion that covers ownership, CI, and release strategy.

* If this is intended to be published as a Python package, it may also be worth considering a `pyproject.toml`-based setup instead of only `setup.py`.

Sir I'll remove the copied schema file and load config-schema.json from the repo root (schema/config-schema.json) as the single source of truth. I'll also fix the validator dialect to match what the schema declares.
I already have a Python CI workflow ready in PR #180 it runs pytest across Python 3.10-3.13 and ruff for linting. I'll rebase it on top of this PR so they work together.
I'll audit the serialization to make sure omitempty behavior and timestamp handling matches the Go types exactly. PR #189 (Go unit tests) documents the expected behavior which I'll use as reference.
For the suggestions I'll migrate from setup.py to pyproject.toml. This PR addresses issue #138, I'll link it in the description.

Will push the fixes shortly!

Signed-off-by: pradhyum6144 <pradhyum314@gmail.com>
- Remove copied config-schema.json, load from repo root as single source of truth
- Fix validator dialect: use Draft4Validator to match schema's draft-04 declaration
- Fix timestamp serialization to use 'Z' suffix for UTC, matching Go's RFC 3339
- Fix FileMetadata.mtime to always serialize (matches Go's non-pointer time.Time)
- Migrate from setup.py to pyproject.toml
- Fix import sorting (ruff)

Closes modelpack#138

Signed-off-by: pradhyum6144 <pradhyum314@gmail.com>
@aftersnow
Copy link
Contributor

I would strongly encourage collaboration between the authors of #184 and #175 so we can preserving the benefits of schema-driven generation.

@pradhyum6144
Copy link
Contributor Author

I would strongly encourage collaboration between the authors of #184 and #175 so we can preserving the benefits of schema-driven generation.

Thanks @aftersnow! I'm happy to collaborate with @rishi-jat on this.

I see the key difference: #184 uses schema-driven auto-generation (datamodel-code-generator → Pydantic), while #175 uses hand-written dataclasses with a schema validator. The auto generation approach is better for keeping Python types in sync with the canonical schema long-term.

I can update #175 to adopt the schema driven generation from #184 while keeping the additions from my PR the validator, test suite (64 tests), and pyproject.toml packaging. That way we get the best of both: auto generated types that stay in sync + validation tooling + comprehensive tests.

@rishi-jat happy to coordinate on this let me know how you'd like to split the work or if you'd prefer we merge our efforts into one PR.

@rishi-jat
Copy link

@pradhyum6144 As per the current direction, please update this PR to align with the schema-driven approach and avoid maintaining separate handwritten types.

Replace hand-written dataclasses with auto-generated Pydantic models
from schema/config-schema.json using datamodel-code-generator. This
keeps Python types in sync with the canonical schema automatically.

- Add tools/generate_python_models.py for type generation
- Add Makefile target: make generate-python-models
- Replace config.py (hand-written) with models.py (auto-generated)
- Update tests to use Pydantic model_validate/model_dump API
- Add pydantic>=2 dependency to pyproject.toml
- Add specs-python/README.md with usage and regeneration docs
- All 73 tests pass

Signed-off-by: pradhyum6144 <pradhyum314@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Is is possible to auto generate Python APIs?

3 participants