You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A comprehensive toolkit for quantizing large language models to GGUF format with support for multiple acceleration backends (CUDA, Metal, CPU).
π Features
Feature
Status
Description
π₯οΈ Bare Metal
β
Native installation without Docker
π§ Auto Setup
β
Automatic environment detection and configuration
π― Multi-Backend
β
CUDA, Metal (Apple Silicon), and CPU support
π¦ Conda Ready
β
Complete conda environment with all dependencies
β‘ Quick Scripts
β
Convenient scripts for common tasks
π Perplexity
β
Automated quality testing of quantized models
π Validation
β
Environment health checks and troubleshooting
π Prerequisites
Requirement
Minimum Version
Notes
Conda
Latest
Miniconda or Anaconda
Python
3.11+
Installed via conda
Git
2.0+
For repository operations
CMake
3.14+
For building llama.cpp
GPU Support (Optional)
Platform
Requirements
Acceleration
NVIDIA
CUDA 11.8+
β CUDA acceleration
Apple Silicon
macOS + M1/M2/M3
β Metal acceleration
Others
Any CPU
β Optimized CPU processing
π οΈ Quick Setup
Option 1: Automated Setup (Recommended)
# Clone the repository
git clone https://github.com/Vikhrmodels/quantization-utils.git
cd quantization-utils
# Run the automated setup script
chmod +x scripts/setup.sh
./scripts/setup.sh
Option 2: Manual Setup
# Create conda environment (OS-specific)# For Linux:
conda env create -f environment-linux.yml
# For macOS:
conda env create -f environment-macos.yml
# Generic (fallback):
conda env create -f environment.yml
# Activate environment
conda activate quantization-utils
# Run setup to install llama.cpp and prepare directories
python setup.py
# Add to PATH (if needed)export PATH="$HOME/.local/bin:$PATH"
π Validation
Verify your installation:
# Check environment health
./scripts/validate.sh
# Quick test
conda activate quantization-utils
cd GGUF
python -c "from shared import validate_environment; validate_environment()"
π Usage Examples
Basic Model Quantization
# Activate environment
conda activate quantization-utils
# Quantize a model with default settings
./scripts/quantize.sh microsoft/DialoGPT-medium
# Custom quantization levels
./scripts/quantize.sh Vikhrmodels/Vikhr-Gemma-2B-instruct -q Q4_K_M,Q5_K_M,Q8_0
# Force re-quantization
./scripts/quantize.sh microsoft/DialoGPT-medium --force
Advanced Pipeline Usage
cd GGUF
# Full pipeline with all quantization levels
python pipeline.py --model_id microsoft/DialoGPT-medium
# Specific quantization levels only
python pipeline.py --model_id microsoft/DialoGPT-medium -q Q4_K_M -q Q8_0
# With perplexity testing
python pipeline.py --model_id microsoft/DialoGPT-medium --perplexity
# For gated models (requires HF token)
python pipeline.py --model_id meta-llama/Llama-2-7b-hf --hf_token $HF_TOKEN
Perplexity Testing
# Test all quantized versions
./scripts/perplexity.sh microsoft/DialoGPT-medium
# Force recalculation
./scripts/perplexity.sh microsoft/DialoGPT-medium --force