🔄 Code Format Transformer

📰 News

Important

🚀 Latest: We have extended support to more languages and released a new tool: Deblank.

This tool addresses the "Hidden Cost of Readability" in LLM processing by providing bidirectional transformation between human-readable (formatted) code and token-efficient (unformatted) code for LLM consumption.

📋 Overview

When processing code through Large Language Models (LLMs), formatting elements like indentation, spaces, and newlines significantly increase token consumption while providing minimal benefits for SOTA models. This tool allows you to:

Convert formatted code to unformatted code for efficient LLM processing (token reduction of 22-42%)
Convert unformatted code back to formatted code for human readability

The transformation preserves complete program semantics while only removing formatting elements that don't affect execution.

⚙️ Installation

Prerequisites

Python 3.10+
Uncrustify (for C-family languages: C++, Java, C#)
YAPF (for Python)

Installing Uncrustify

On Ubuntu/Debian:

sudo apt-get install uncrustify

On macOS:

brew install uncrustify

On Windows:

# Using Chocolatey
choco install uncrustify

# Or download the binary from https://sourceforge.net/projects/uncrustify/

Installing the tool:

cd The-hidden-cost
pip install -r requirements.txt

💻 Usage

The main interface is through the format_manager.py script:

python format_manager.py [input_file] [output_file] [direction] [--config-dir CONFIG_DIR]

Parameters:

input_file: Path to the source code file
output_file: Path where the transformed code will be saved
direction: Processing direction
- format: Convert unformatted code to formatted code (for human readability)
- unformat: Convert formatted code to unformatted code (for LLM efficiency)
--config-dir: Directory containing configuration files (default: "cfg")

Examples:

Convert formatted Java code to unformatted code for LLM processing:

python format_manager.py MyCode.java MyCode.unformatted.java unformat

Convert unformatted C++ code back to formatted code for human readability:

python format_manager.py solution.unformatted.cpp solution.formatted.cpp format

🌐 Supported Languages

Java
C++
C#
Python

🛠️ Configuration

The tool uses language-specific formatters with configurations stored in the cfg directory:

C-family languages use Uncrustify with custom configuration files
- For C++, the configuration is based on rindeal/uncrustify-c-cpp.cfg, which provides widely adopted style rules.
Python uses YAPF with custom configuration

📊 Performance

AST preservation: 100% semantic equivalence verified across the McEval dataset
Average transformation speed: 76ms per code sample
Token reduction: 22-42% for input code (language dependent)

✨ Benefits for LLM Applications

💰 Reduced token consumption for API-based LLMs (direct cost savings)
⚡ Faster processing times
🎯 Improved inference efficiency without compromising model performance

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
cfg		cfg
formatter_core		formatter_core
README.md		README.md
format_manager.py		format_manager.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔄 Code Format Transformer

📰 News

📋 Overview

⚙️ Installation

Prerequisites

Installing Uncrustify

On Ubuntu/Debian:

On macOS:

On Windows:

Installing the tool:

💻 Usage

Parameters:

Examples:

Convert formatted Java code to unformatted code for LLM processing:

Convert unformatted C++ code back to formatted code for human readability:

🌐 Supported Languages

🛠️ Configuration

📊 Performance

✨ Benefits for LLM Applications

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔄 Code Format Transformer

📰 News

📋 Overview

⚙️ Installation

Prerequisites

Installing Uncrustify

On Ubuntu/Debian:

On macOS:

On Windows:

Installing the tool:

💻 Usage

Parameters:

Examples:

Convert formatted Java code to unformatted code for LLM processing:

Convert unformatted C++ code back to formatted code for human readability:

🌐 Supported Languages

🛠️ Configuration

📊 Performance

✨ Benefits for LLM Applications

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages