-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem? Please describe.
The current codeanalyzer outputs analysis results as pretty-printed JSON by default, which creates unnecessarily large files for complex codebases. When analyzing large Python projects, the JSON output can become several megabytes or larger, making it slow to transfer, store, and process. Additionally, there's no option to choose different output formats optimized for different use cases.
Describe the solution you'd like
Add a --format flag that supports two output formats:
- JSON (default): Compact JSON without whitespace for smaller file sizes while maintaining human readability
- MessagePack: Ultra-compressed binary format using MessagePack + gzip compression
The feature should:
- Use
--format jsonfor compact JSON output (no indentation/whitespace) - Use
--format msgpackfor maximum compression - Display compression ratios when saving MessagePack files
- Support both stdout output (JSON only) and file output
- Maintain full round-trip compatibility for all data types
Example usage:
# Compact JSON to stdout (default)
codeanalyzer -i /path/to/code
# Compact JSON to file
codeanalyzer -i /path/to/code -o ./output --format json
# Maximum compression to file
codeanalyzer -i /path/to/code -o ./output --format msgpackDescribe alternatives you've considered
- External compression: Users could manually gzip JSON files, but this requires extra steps and doesn't achieve the same compression ratios as MessagePack
- Database storage: Could store results in SQLite, but this adds complexity and isn't as portable
- Multiple file formats: Considered protobuf, parquet, and other formats, but MessagePack provides the best balance of compression, speed, and simplicity
- Configuration files: Could use config files instead of CLI flags, but command-line options are more convenient for scripting
Additional context
- MessagePack is widely supported across programming languages, making the output format useful for downstream tools
- The compression is particularly effective for code analysis data due to repetitive patterns (function names, types, imports)
- Large codebases can see 80-90% size reduction compared to pretty-printed JSON
- Round-trip serialization ensures no data loss when loading compressed files back into the tool
- This enables efficient storage and transfer of analysis results for CI/CD pipelines and distributed analysis workflows
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request