Important
🚀 Latest: We have extended support to more languages and released a new tool: Deblank.
This tool addresses the "Hidden Cost of Readability" in LLM processing by providing bidirectional transformation between human-readable (formatted) code and token-efficient (unformatted) code for LLM consumption.
When processing code through Large Language Models (LLMs), formatting elements like indentation, spaces, and newlines significantly increase token consumption while providing minimal benefits for SOTA models. This tool allows you to:
- Convert formatted code to unformatted code for efficient LLM processing (token reduction of 22-42%)
- Convert unformatted code back to formatted code for human readability
The transformation preserves complete program semantics while only removing formatting elements that don't affect execution.
- Python 3.10+
- Uncrustify (for C-family languages: C++, Java, C#)
- YAPF (for Python)
sudo apt-get install uncrustifybrew install uncrustify# Using Chocolatey
choco install uncrustify
# Or download the binary from https://sourceforge.net/projects/uncrustify/cd The-hidden-cost
pip install -r requirements.txtThe main interface is through the format_manager.py script:
python format_manager.py [input_file] [output_file] [direction] [--config-dir CONFIG_DIR]input_file: Path to the source code fileoutput_file: Path where the transformed code will be saveddirection: Processing directionformat: Convert unformatted code to formatted code (for human readability)unformat: Convert formatted code to unformatted code (for LLM efficiency)
--config-dir: Directory containing configuration files (default: "cfg")
python format_manager.py MyCode.java MyCode.unformatted.java unformatpython format_manager.py solution.unformatted.cpp solution.formatted.cpp format- Java
- C++
- C#
- Python
The tool uses language-specific formatters with configurations stored in the cfg directory:
-
C-family languages use Uncrustify with custom configuration files
- For C++, the configuration is based on rindeal/uncrustify-c-cpp.cfg, which provides widely adopted style rules.
-
Python uses YAPF with custom configuration
- AST preservation: 100% semantic equivalence verified across the McEval dataset
- Average transformation speed: 76ms per code sample
- Token reduction: 22-42% for input code (language dependent)
- 💰 Reduced token consumption for API-based LLMs (direct cost savings)
- ⚡ Faster processing times
- 🎯 Improved inference efficiency without compromising model performance