llama.cpp submodule update from b6153 to b7868 by metaspartan · Pull Request #1 · womboai/llama-cpp-python

metaspartan · 2026-01-07T06:46:32Z

Adds support for flash attention type in context params and updates related logic in llama.py. Refactors deprecated sampling methods to improve error messaging. Updates llama_cpp.py with new constants, fields, and function signatures for API consistency and new features. Bumps version to 0.3.17.

Note

Medium Risk
Moderate risk because it syncs Python bindings with upstream llama.cpp C API changes (struct layouts, removed symbols, and flash-attention configuration), which can break runtime loading or behavior if the packaged native library and bindings get out of sync.

Overview
Bumps the package to 0.3.18 and updates the vendored llama.cpp integration, including a build fix that sets a default LLAMA_INSTALL_VERSION for the mtmd sub-build.

Updates the Python API and ctypes bindings to match upstream C API changes: replaces the context flash_attn bool with a flash_attn_type enum (while keeping Python backward compatibility), adds new llama.cpp constants/struct fields and helper functions (e.g., llama_n_ctx_seq, llama_max_tensor_buft_overrides, adapter metadata + aLoRA helpers, llama_log_get, llama_memory_breakdown_print, llama_model_is_hybrid), and removes deprecated KV-cache (llama_kv_self_*) and llama_sampler_init_softmax bindings.

Also switches internal code paths to use the new memory wrappers (e.g., embeddings KV-cache clearing) and includes mostly formatting/typing cleanups in sampler/chat-format code.

^{Written by Cursor Bugbot for commit ca37242. This will update automatically on new commits. Configure here.}

Adds support for flash attention type in context params and updates related logic in llama.py. Refactors deprecated sampling methods to improve error messaging. Updates llama_cpp.py with new constants, fields, and function signatures for API consistency and new features. Bumps version to 0.3.17.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36f7b221ef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

llama_cpp/llama.py

Upgrades vendor/llama.cpp to commit 3bcc990, introducing new features such as the adaptive probability-based sampler (`llama_sampler_init_adaptive_p`), direct I/O support (`use_direct_io` in llama_model_params), GLM 4.7 Flash support, Qwen3 Next model support, and self-speculative decoding. Updates Python bindings and chat format handlers to support these features and improve code formatting and clarity.

Added Python bindings for new llama.cpp APIs: llama_max_tensor_buft_overrides, llama_model_is_hybrid, adapter metadata functions, aLoRA invocation token functions, llama_memory_breakdown_print, and llama_log_get. Updated changelog to reflect new features, fixes, and deprecations in version 0.3.18. Commented out llama_sampler_init_adaptive_p binding as it requires a library rebuild.

chatgpt-codex-connector bot reviewed Jan 7, 2026

View reviewed changes

llama_cpp/llama.py Outdated Show resolved Hide resolved

metaspartan added 2 commits January 28, 2026 21:47

Update llama.py

b6796e4

metaspartan changed the title ~~llama.cpp submodule update from b6153 to b7652~~ llama.cpp submodule update from b6153 to b7868 Jan 29, 2026

metaspartan added 2 commits January 28, 2026 21:56

Update __init__.py

ca37242

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp submodule update from b6153 to b7868#1

llama.cpp submodule update from b6153 to b7868#1
metaspartan wants to merge 5 commits intomainfrom
dev

metaspartan commented Jan 7, 2026 •

edited by cursor bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

metaspartan commented Jan 7, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

metaspartan commented Jan 7, 2026 •

edited by cursor bot

Loading