Skip to content

Better more performant output formatting #6

@rahlk

Description

@rahlk

Is your feature request related to a problem? Please describe.
The current codeanalyzer outputs analysis results as pretty-printed JSON by default, which creates unnecessarily large files for complex codebases. When analyzing large Python projects, the JSON output can become several megabytes or larger, making it slow to transfer, store, and process. Additionally, there's no option to choose different output formats optimized for different use cases.

Describe the solution you'd like
Add a --format flag that supports two output formats:

  1. JSON (default): Compact JSON without whitespace for smaller file sizes while maintaining human readability
  2. MessagePack: Ultra-compressed binary format using MessagePack + gzip compression

The feature should:

  • Use --format json for compact JSON output (no indentation/whitespace)
  • Use --format msgpack for maximum compression
  • Display compression ratios when saving MessagePack files
  • Support both stdout output (JSON only) and file output
  • Maintain full round-trip compatibility for all data types

Example usage:

# Compact JSON to stdout (default)
codeanalyzer -i /path/to/code

# Compact JSON to file
codeanalyzer -i /path/to/code -o ./output --format json

# Maximum compression to file
codeanalyzer -i /path/to/code -o ./output --format msgpack

Describe alternatives you've considered

  • External compression: Users could manually gzip JSON files, but this requires extra steps and doesn't achieve the same compression ratios as MessagePack
  • Database storage: Could store results in SQLite, but this adds complexity and isn't as portable
  • Multiple file formats: Considered protobuf, parquet, and other formats, but MessagePack provides the best balance of compression, speed, and simplicity
  • Configuration files: Could use config files instead of CLI flags, but command-line options are more convenient for scripting

Additional context

  • MessagePack is widely supported across programming languages, making the output format useful for downstream tools
  • The compression is particularly effective for code analysis data due to repetitive patterns (function names, types, imports)
  • Large codebases can see 80-90% size reduction compared to pretty-printed JSON
  • Round-trip serialization ensures no data loss when loading compressed files back into the tool
  • This enables efficient storage and transfer of analysis results for CI/CD pipelines and distributed analysis workflows

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions