Skip to content

Conversation

@rahlk
Copy link
Contributor

@rahlk rahlk commented Jul 11, 2025

Add output format flag with MessagePack compression support

Address issue #6 by implementing a --format flag that allows users to choose between compact JSON and ultra-compressed MessagePack output formats, achieving 80-90% size reduction for large analysis results.

Motivation and Context

Large Python codebases generate analysis results that can be several megabytes in size when output as pretty-printed JSON. This creates problems for:

  • Storage efficiency: Large files consume unnecessary disk space
  • Transfer speed: Slow uploads/downloads in CI/CD pipelines
  • Processing performance: Large JSON files are slower to parse
  • Network bandwidth: Inefficient for distributed analysis workflows

This change addresses these issues by providing format options optimized for different use cases while maintaining full data fidelity.

How Has This Been Tested?

  • Real codebase testing: Tested on large Python projects (1000+ files)
  • Format validation: Both JSON and MessagePack outputs validated
  • Round-trip testing: Serialization → deserialization preserves all data
  • Compression verification: Confirmed 80-90% size reduction on real data
  • Error scenarios: Invalid format options handled gracefully
  • Backward compatibility: Existing workflows continue unchanged

Example compression results on real project:

❯ uv run codeanalyzer --input $PWD -vvv --output /tmp --format=msgpack --keep-cache
Building symbol table ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 21/21 files 0:00:04 0:00:00
[07/11/25 00:41:57] INFO     ✅ Symbol table generation complete.
                    INFO     Analysis saved to /tmp/analysis.msgpack
                    INFO     Compression ratio: 9.2% of JSON 

Breaking Changes

None. This is a fully backward-compatible addition:

  • Default behavior remains unchanged (compact JSON output)
  • All existing CLI commands work exactly as before
  • No changes to programmatic APIs

Types of changes

  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

Additional context

N/A

@rahlk rahlk linked an issue Jul 11, 2025 that may be closed by this pull request
@rahlk rahlk self-assigned this Jul 11, 2025
@rahlk rahlk added the enhancement New feature or request label Jul 11, 2025
@rahlk rahlk merged commit 34b7596 into main Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Better more performant output formatting

2 participants