feat: add Python SDK for model spec types and validation#175
feat: add Python SDK for model spec types and validation#175pradhyum6144 wants to merge 3 commits intomodelpack:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new Python SDK for the CNCF ModelPack specification, providing a robust and consistent way to define, serialize, and validate AI model configurations in Python. By mirroring the existing Go SDK's type definitions and utilizing the same JSON schema for validation, this SDK ensures interoperability and a unified experience for developers working with ModelPack across different language ecosystems. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a well-structured Python SDK for ModelPack specifications, mirroring the existing Go implementation. The use of dataclasses for model types and a shared JSON schema for validation are excellent choices for ensuring consistency across languages. However, I've identified a couple of critical issues. Firstly, there's a compatibility problem with Python 3.10 regarding the parsing of RFC3339 timestamps with a 'Z' suffix, which will cause errors. Secondly, the method for loading the JSON schema file is not robust for a distributable package and will fail upon installation. My review includes specific suggestions to address these critical issues to ensure the SDK is both correct and packageable.
| if "mtime" in data: | ||
| mod_time = datetime.fromisoformat(data["mtime"]) |
There was a problem hiding this comment.
datetime.fromisoformat() does not support the 'Z' suffix for UTC timezones in Python versions before 3.11. Since your setup.py supports Python 3.10, this will cause a ValueError for valid RFC3339 timestamps ending in 'Z'. You can fix this by replacing 'Z' with '+00:00' before parsing.
| if "mtime" in data: | |
| mod_time = datetime.fromisoformat(data["mtime"]) | |
| if "mtime" in data and data["mtime"]: | |
| mod_time = datetime.fromisoformat(data["mtime"].replace("Z", "+00:00")) |
specs-python/modelpack/v1/config.py
Outdated
| if "knowledgeCutoff" in data: | ||
| kwargs["knowledge_cutoff"] = datetime.fromisoformat(data["knowledgeCutoff"]) |
There was a problem hiding this comment.
datetime.fromisoformat() does not support the 'Z' suffix for UTC timezones in Python versions before 3.11. Since your setup.py supports Python 3.10, this will cause a ValueError for valid RFC3339 timestamps ending in 'Z'. You can fix this by replacing 'Z' with '+00:00' before parsing.
| if "knowledgeCutoff" in data: | |
| kwargs["knowledge_cutoff"] = datetime.fromisoformat(data["knowledgeCutoff"]) | |
| if "knowledgeCutoff" in data and data["knowledgeCutoff"]: | |
| kwargs["knowledge_cutoff"] = datetime.fromisoformat(data["knowledgeCutoff"].replace("Z", "+00:00")) |
specs-python/modelpack/v1/config.py
Outdated
| if "createdAt" in data: | ||
| created_at = datetime.fromisoformat(data["createdAt"]) |
There was a problem hiding this comment.
datetime.fromisoformat() does not support the 'Z' suffix for UTC timezones in Python versions before 3.11. Since your setup.py supports Python 3.10, this will cause a ValueError for valid RFC3339 timestamps ending in 'Z'. You can fix this by replacing 'Z' with '+00:00' before parsing.
| if "createdAt" in data: | |
| created_at = datetime.fromisoformat(data["createdAt"]) | |
| if "createdAt" in data and data["createdAt"]: | |
| created_at = datetime.fromisoformat(data["createdAt"].replace("Z", "+00:00")) |
| def _load_schema() -> dict: | ||
| """Load and return the config JSON schema.""" | ||
| with open(_CONFIG_SCHEMA_PATH) as f: | ||
| return json.load(f) |
There was a problem hiding this comment.
The current method of locating config-schema.json via a hardcoded relative path is not robust for packaging and will fail when the package is installed via pip. The schema file should be included as package data and accessed using importlib.resources.
To fix this, you should:
- Move
schema/config-schema.jsonintospecs-python/modelpack/v1/. - Update
setup.pyto include this file (I have left a separate comment there). - Update this function to load the schema as a resource. You will also need to add
from importlib import resourcesat the top of the file and remove the global_SCHEMA_DIRand_CONFIG_SCHEMA_PATHvariables.
| def _load_schema() -> dict: | |
| """Load and return the config JSON schema.""" | |
| with open(_CONFIG_SCHEMA_PATH) as f: | |
| return json.load(f) | |
| def _load_schema() -> dict: | |
| """Load and return the config JSON schema.""" | |
| import importlib.resources | |
| schema_file = importlib.resources.files("modelpack.v1").joinpath("config-schema.json") | |
| with schema_file.open(encoding="utf-8") as f: | |
| return json.load(f) |
specs-python/setup.py
Outdated
| setup( | ||
| name="modelpack", | ||
| version="0.1.0", | ||
| description="Python SDK for the CNCF ModelPack specification", | ||
| packages=find_packages(), | ||
| python_requires=">=3.10", | ||
| install_requires=[ | ||
| "jsonschema[format]>=4.20.0", | ||
| ], | ||
| extras_require={ | ||
| "dev": [ | ||
| "pytest>=7.0", | ||
| "ruff>=0.4.0", | ||
| ], | ||
| }, | ||
| ) |
There was a problem hiding this comment.
To fix the packaging issue I mentioned in validator.py, you need to tell setuptools to include config-schema.json in the package. After moving the file to specs-python/modelpack/v1/, you can use the package_data argument in setup().
| setup( | |
| name="modelpack", | |
| version="0.1.0", | |
| description="Python SDK for the CNCF ModelPack specification", | |
| packages=find_packages(), | |
| python_requires=">=3.10", | |
| install_requires=[ | |
| "jsonschema[format]>=4.20.0", | |
| ], | |
| extras_require={ | |
| "dev": [ | |
| "pytest>=7.0", | |
| "ruff>=0.4.0", | |
| ], | |
| }, | |
| ) | |
| setup( | |
| name="modelpack", | |
| version="0.1.0", | |
| description="Python SDK for the CNCF ModelPack specification", | |
| packages=find_packages(), | |
| package_data={"modelpack.v1": ["config-schema.json"]}, | |
| python_requires=">=3.10", | |
| install_requires=[ | |
| "jsonschema[format]>=4.20.0", | |
| ], | |
| extras_require={ | |
| "dev": [ | |
| "pytest>=7.0", | |
| "ruff>=0.4.0", | |
| ], | |
| }, | |
| ) |
1d701f6 to
1c931e5
Compare
|
Thanks for working on this — I like the direction, but I have three concerns before this feels ready to merge.
Suggestions:
|
Sir I'll remove the copied schema file and load config-schema.json from the repo root (schema/config-schema.json) as the single source of truth. I'll also fix the validator dialect to match what the schema declares. Will push the fixes shortly! |
Signed-off-by: pradhyum6144 <pradhyum314@gmail.com>
- Remove copied config-schema.json, load from repo root as single source of truth - Fix validator dialect: use Draft4Validator to match schema's draft-04 declaration - Fix timestamp serialization to use 'Z' suffix for UTC, matching Go's RFC 3339 - Fix FileMetadata.mtime to always serialize (matches Go's non-pointer time.Time) - Migrate from setup.py to pyproject.toml - Fix import sorting (ruff) Closes modelpack#138 Signed-off-by: pradhyum6144 <pradhyum314@gmail.com>
1c931e5 to
d177ffe
Compare
Thanks @aftersnow! I'm happy to collaborate with @rishi-jat on this. I see the key difference: #184 uses schema-driven auto-generation (datamodel-code-generator → Pydantic), while #175 uses hand-written dataclasses with a schema validator. The auto generation approach is better for keeping Python types in sync with the canonical schema long-term. I can update #175 to adopt the schema driven generation from #184 while keeping the additions from my PR the validator, test suite (64 tests), and pyproject.toml packaging. That way we get the best of both: auto generated types that stay in sync + validation tooling + comprehensive tests. @rishi-jat happy to coordinate on this let me know how you'd like to split the work or if you'd prefer we merge our efforts into one PR. |
|
@pradhyum6144 As per the current direction, please update this PR to align with the schema-driven approach and avoid maintaining separate handwritten types. |
Replace hand-written dataclasses with auto-generated Pydantic models from schema/config-schema.json using datamodel-code-generator. This keeps Python types in sync with the canonical schema automatically. - Add tools/generate_python_models.py for type generation - Add Makefile target: make generate-python-models - Replace config.py (hand-written) with models.py (auto-generated) - Update tests to use Pydantic model_validate/model_dump API - Add pydantic>=2 dependency to pyproject.toml - Add specs-python/README.md with usage and regeneration docs - All 73 tests pass Signed-off-by: pradhyum6144 <pradhyum314@gmail.com>
Summary
Adds a Python SDK for the CNCF ModelPack specification, addressing issue #138. Uses dataclasses mirroring Go structs in specs-go/v1/ with JSON serialization matching Go's camelCase tags, omitempty behavior, and RFC 3339 timestamp format exactly. Loads config-schema.json from the repo root as the single source of truth (no copy), validates using Draft4Validator to match the schema's draft-04 declaration. Uses pyproject.toml for modern packaging.
Test plan
pytest tests/ -vruff checkandruff formatCloses #138