Skip to content

LCORE-1285: update llama stack to 0.5.2#1112

Closed
jrobertboos wants to merge 127 commits intolightspeed-core:mainfrom
jrobertboos:lcore-1285
Closed

LCORE-1285: update llama stack to 0.5.2#1112
jrobertboos wants to merge 127 commits intolightspeed-core:mainfrom
jrobertboos:lcore-1285

Conversation

@jrobertboos
Copy link
Contributor

@jrobertboos jrobertboos commented Feb 6, 2026

Description

Updated Llama Stack to 0.5.0 in order to enable the network configuration on providers so that TLS and Proxy support can be added.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: N/A
  • Generated by: N/A

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features
    • Choose a RAG strategy: Inline RAG or Tool RAG with BYOK and OKP support; improved document/context retrieval and scoring.
    • Jinja2-powered, cached prompt templates for inference; MCP tool support including file-based auth.
  • Chores
    • Upgraded core framework and tooling versions; bumped public release to 0.4.2.
    • Switched to an inline vector store option for local embeddings.
  • Bug Fixes
    • More robust streaming and response handling (safer parts parsing, clearer long-query error paths).
  • Tests
    • Expanded e2e and unit coverage for RAG, MCP auth, prompt templating, and size limits.
  • Documentation
    • New and updated guides (RAG, BYOK/OKP, developer contributing, config).

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 6, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8322af0c-ad67-483e-8fc4-551c29e313ea

📥 Commits

Reviewing files that changed from the base of the PR and between 403e6c3 and 5007f5b.

⛔ Files ignored due to path filters (4)
  • docs/config.png is excluded by !**/*.png
  • docs/config.svg is excluded by !**/*.svg
  • docs/quota_scheduler.svg is excluded by !**/*.svg
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (134)
  • .tekton/lightspeed-stack-pull-request.yaml
  • .tekton/lightspeed-stack-push.yaml
  • AGENTS.md
  • CLAUDE.md
  • CLAUDE.md
  • Makefile
  • README.md
  • docker-compose.yaml
  • docs/byok_guide.md
  • docs/config.html
  • docs/config.json
  • docs/config.md
  • docs/config.puml
  • docs/demos/lcore/contributing_guidelines.html
  • docs/demos/lcore/contributing_guidelines.md
  • docs/openapi.json
  • docs/quota_scheduler.puml
  • docs/rag_guide.md
  • docs/splunk.md
  • examples/lightspeed-stack-byok-okp-rag.yaml
  • examples/lightspeed-stack-byok-rag.yaml
  • examples/run.yaml
  • lightspeed-stack.yaml
  • pyproject.toml
  • requirements-build.txt
  • requirements.hashes.source.txt
  • requirements.hashes.wheel.txt
  • requirements.overrides.txt
  • run.yaml
  • scripts/generate_openapi_schema.py
  • scripts/konflux_requirements.sh
  • src/app/endpoints/a2a.py
  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/query.py
  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/tools.py
  • src/app/main.py
  • src/authentication/jwk_token.py
  • src/authorization/middleware.py
  • src/configuration.py
  • src/constants.py
  • src/llama_stack_configuration.py
  • src/log.py
  • src/metrics/__init__.py
  • src/models/config.py
  • src/models/requests.py
  • src/models/responses.py
  • src/models/rlsapi/requests.py
  • src/observability/README.md
  • src/quota/quota_exceed_error.py
  • src/quota/sql.py
  • src/utils/common.py
  • src/utils/connection_decorator.py
  • src/utils/conversations.py
  • src/utils/mcp_oauth_probe.py
  • src/utils/query.py
  • src/utils/responses.py
  • src/utils/types.py
  • src/utils/vector_search.py
  • src/version.py
  • test.containerfile
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-mcp-file-auth.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-rbac.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-mcp-file-auth.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-rbac.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/features/environment.py
  • tests/e2e/features/info.feature
  • tests/e2e/features/mcp.feature
  • tests/e2e/features/mcp_file_auth.feature
  • tests/e2e/features/query.feature
  • tests/e2e/features/rlsapi_v1.feature
  • tests/e2e/features/rlsapi_v1_errors.feature
  • tests/e2e/features/steps/conversation.py
  • tests/e2e/features/steps/info.py
  • tests/e2e/features/steps/llm_query_response.py
  • tests/e2e/features/steps/rlsapi_v1.py
  • tests/e2e/features/streaming_query.feature
  • tests/e2e/mock_mcp_server/server.py
  • tests/e2e/secrets/mcp-token
  • tests/e2e/test_list.txt
  • tests/e2e/utils/llama_stack_shields.py
  • tests/integration/conftest.py
  • tests/integration/endpoints/test_health_integration.py
  • tests/integration/endpoints/test_info_integration.py
  • tests/integration/endpoints/test_model_list.py
  • tests/integration/endpoints/test_query_integration.py
  • tests/integration/endpoints/test_rlsapi_v1_integration.py
  • tests/integration/endpoints/test_root_endpoint.py
  • tests/integration/endpoints/test_streaming_query_integration.py
  • tests/integration/endpoints/test_tools_integration.py
  • tests/integration/test_rh_identity_integration.py
  • tests/integration/test_version.py
  • tests/unit/__init__.py
  • tests/unit/a2a_storage/test_storage_factory.py
  • tests/unit/app/endpoints/test_a2a.py
  • tests/unit/app/endpoints/test_conversations.py
  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/app/endpoints/test_metrics.py
  • tests/unit/app/endpoints/test_query.py
  • tests/unit/app/endpoints/test_rlsapi_v1.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/app/endpoints/test_tools.py
  • tests/unit/app/test_database.py
  • tests/unit/app/test_main_middleware.py
  • tests/unit/app/test_routers.py
  • tests/unit/authentication/test_jwk_token.py
  • tests/unit/authentication/test_rh_identity.py
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/conftest.py
  • tests/unit/models/config/test_authentication_configuration.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_llama_stack_configuration.py
  • tests/unit/models/config/test_rag_configuration.py
  • tests/unit/models/config/test_tls_configuration.py
  • tests/unit/models/responses/test_rag_chunk.py
  • tests/unit/models/responses/test_successful_responses.py
  • tests/unit/models/rlsapi/test_requests.py
  • tests/unit/observability/formats/test_rlsapi.py
  • tests/unit/observability/test_splunk.py
  • tests/unit/test_client.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/test_log.py
  • tests/unit/utils/test_common.py
  • tests/unit/utils/test_mcp_headers.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_types.py
  • tests/unit/utils/test_vector_search.py

Walkthrough

Introduces RAG strategy support (BYOK and OKP), upgrades llama-stack to 0.5.2, adds Jinja2 prompt templating and caching, changes vector-search and RAG context construction (build_rag_context), updates configuration models to Rag/Okp, expands MCP auth probing, and broad test/doc additions and updates.

Changes

Cohort / File(s) Summary
Core config & deps
pyproject.toml, requirements*.txt, requirements.hashes.*, Makefile
Bumped llama-stack packages to 0.5.2; added jinja2; refreshed many build/lock hashes and some Makefile help text.
Constants & version
src/constants.py, src/version.py
Updated MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION to 0.5.2; added RAG/OKP/SOLR-related constants, score multiplier, new env var key.
Configuration models & generation
src/models/config.py, src/configuration.py, src/llama_stack_configuration.py, examples/*, docs/*, examples/lightspeed-stack-byok-okp-rag.yaml
Replaced Solr-centric config with Rag/Okp and ByokRag entries (score_multiplier, rag_id), added Rag/Okp models and validation, updated config generation/enrichment (enrich_solr) and examples/docs.
Vector search & RAG plumbing
src/utils/vector_search.py, src/utils/responses.py, src/utils/types.py, src/utils/query.py
Major refactor: added build_rag_context, RAGContext type, resolve_vector_store_ids, prepare_tools changes, inline/tool RAG flows, BYOK/OKP handling, weighted scoring, safer part.type access, and serialization fixes for MCP tool authorization.
Endpoints: inference, query, streaming, tools, a2a, rlsapi_v1
src/app/endpoints/rlsapi_v1.py, src/app/endpoints/query.py, src/app/endpoints/streaming_query.py, src/app/endpoints/tools.py, src/app/endpoints/a2a.py
Added Jinja2 sandboxed prompt templating/caching, enhanced error mapping, async default-model discovery, inline_rag_context propagation, moderation_input flow, MCP auth checks (check_mcp_auth), timestamp UTC aliasing, and TurnSummary inline_rag_documents exposure.
MCP auth & logging
src/utils/mcp_oauth_probe.py, src/log.py, src/utils/mcp_oauth_probe.py
New check_mcp_auth and probe_mcp helpers for MCP OAuth probing; logging handler selection may be disabled via env var; added env-driven log behavior.
Persistence / quota SQL
src/quota/sql.py
Switched token consumption SQL to upsert (ON CONFLICT) for both SQLite and Postgres variants.
Metrics
src/metrics/__init__.py
llm_calls_failures_total metric now includes provider and model labels.
Tests & e2e
tests/*, tests/e2e/*, tests/unit/*
Extensive additions/updates for RAG, OKP, MCP file auth, rlsapi_v1 tests, streaming long-query handling, many tests adapted to new models/types and behavior; added e2e secrets and mocks.
Docs & examples
AGENTS.md, docs/*, README.md, docs/rag_guide.md, docs/byok_guide.md
Large documentation additions and revisions describing RAG strategies (Inline vs Tool), BYOK/OKP configuration, run.yaml enrichment examples, and contributing guidelines.
CI / Tekton / packaging
.tekton/*.yaml, test.containerfile, scripts/*
Updated Tekton task bundle digests and prefetch package lists; parameterized RHOAI index in scripts.
Misc & typing
numerous small files (src/utils/*, tests)
Type hint modernizations (PEP 585, collections.abc), minor formatter/lint config changes, and many small import/signature adjustments.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Lightspeed as Lightspeed Core
  participant Llama as Llama Stack
  participant BYOK as BYOK Vector Stores
  participant OKP as OKP Provider
  participant MCP as MCP Server

  Client->>Lightspeed: POST /query (question, optional rag/tool ids, headers)
  Lightspeed->>Lightspeed: build inline_rag_context (concurrently)
  Lightspeed->>BYOK: fetch BYOK chunks (if BYOK inline)
  Lightspeed->>OKP: fetch OKP chunks (if OKP inline)
  BYOK-->>Lightspeed: BYOK rag_chunks + referenced_documents
  OKP-->>Lightspeed: OKP rag_chunks + referenced_documents
  Lightspeed->>Lightspeed: merge rag_chunks, apply score_multiplier, build context_text
  Lightspeed->>Llama: prepare ResponsesApiParams (context_text / tools / MCP headers)
  Lightspeed->>MCP: check_mcp_auth (if MCP tools present)
  Llama-->>Lightspeed: Responses API stream / final response
  Lightspeed->>Lightspeed: run moderation using moderation_input
  Lightspeed-->>Client: aggregated response with referenced_documents and request_id
Loading

Notes: rectangles denote components; BYOK and OKP fetches run in parallel; MCP auth is checked before including MCP tools.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • are-ces
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pyproject.toml`:
- Around line 31-33: Update the mismatched dependency for llama-stack-api:
replace "llama-stack-api==0.5.0" with the latest published version
"llama-stack-api==0.4.3" (or align all three to a consistent, released version)
so installations won't fail; locate the dependency entry for llama-stack-api in
the pyproject.toml dependency list and change the version string accordingly.

@jrobertboos jrobertboos marked this pull request as draft February 9, 2026 20:13
jrobertboos and others added 26 commits February 9, 2026 15:14
…points to verify 401 responses with WWW-Authenticate when MCP OAuth is required. Add mock fixtures for Llama Stack client interactions in each test file.
…es out in query, streaming_query, and tools endpoints. Each test verifies that a 401 status is returned without a WWW-Authenticate header upon timeout.
We do not want to accept unbounded amounts of input.
Use base 2 numbers because they are cool and nerdy.
The project configures pytest asyncio mode to auto so it is unnecessary
to marke individual tests as async.
Need to call extract_token_usage() in order to increment the metrics counter
Get the provider and model in order to pass that to _record_inference_failure.
Add model and provider labels to the Counter.
Signed-off-by: red-hat-konflux-kflux-prd-rh02 <190377777+red-hat-konflux-kflux-prd-rh02[bot]@users.noreply.github.com>
[LCORE-1393] chore: add agents.md and symlinked claude.md
Signed-off-by: Major Hayden <major@redhat.com>
Signed-off-by: Major Hayden <major@redhat.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Around line 31-33: The pyproject dependency list pins "llama-stack-api==0.5.2"
which does not exist on PyPI; update the package spec to a valid version (e.g.,
change "llama-stack-api==0.5.2" to "llama-stack-api==0.5.1") and decide whether
to align "llama-stack==0.5.2" and "llama-stack-client==0.5.2" to 0.5.1 for
compatibility—ensure all three package entries ("llama-stack",
"llama-stack-client", "llama-stack-api") reference existing matching versions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3c97ce73-2d75-4d10-94cb-06ad82568e92

📥 Commits

Reviewing files that changed from the base of the PR and between 60efb35 and 74eb52c.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • pyproject.toml
  • src/constants.py
  • tests/e2e/features/info.feature
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/constants.py

Comment on lines +31 to +33
"llama-stack==0.5.2",
"llama-stack-client==0.5.2",
"llama-stack-api==0.5.2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

llama-stack-api 0.5.2 PyPI package availability

💡 Result:

As of March 6, 2026, llama-stack-api==0.5.2 is not available on PyPI. The PyPI release history for llama-stack-api shows the latest published version as 0.5.1 (Feb 19, 2026), with earlier releases like 0.5.0, 0.4.5, etc., and no 0.5.2 entry. [1]

What you can do

  • Install the latest available: pip install llama-stack-api==0.5.1 (or omit the version to get latest). [1]
  • If you specifically need 0.5.2, it likely hasn’t been published to PyPI yet (you’d need to install from source/tag instead). [1]

Sources
[1] PyPI project page / release history for llama-stack-api (shows latest as 0.5.1, no 0.5.2) (pypi.org)

Citations:


Fix version pinning: llama-stack-api==0.5.2 does not exist on PyPI.

llama-stack-api==0.5.2 is not available on PyPI. The latest published version is 0.5.1 (Feb 19, 2026). Update line 33 to use llama-stack-api==0.5.1 or check if the other llama-stack packages (llama-stack and llama-stack-client) should also be downgraded to 0.5.1 for compatibility.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 31 - 33, The pyproject dependency list pins
"llama-stack-api==0.5.2" which does not exist on PyPI; update the package spec
to a valid version (e.g., change "llama-stack-api==0.5.2" to
"llama-stack-api==0.5.1") and decide whether to align "llama-stack==0.5.2" and
"llama-stack-client==0.5.2" to 0.5.1 for compatibility—ensure all three package
entries ("llama-stack", "llama-stack-client", "llama-stack-api") reference
existing matching versions.

@asimurka asimurka changed the title LCORE-1285: update llama stack to 0.5.0 LCORE-1285: update llama stack to 0.5.2 Mar 9, 2026
Copy link
Contributor

@asimurka asimurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash the commits into a single one and rebase on the latest changes.

tisnik and others added 10 commits March 9, 2026 09:55
…ing-for-0.4.2-release

LCORE-1215: Preparing for 0.4.2 release
…linter-issue

LCORE- 1433: fixed linter issue + enable new linter rule
…linter-issue

LCORE-1434: fixed linter issue
…api; adjust constants and tests accordingly

Updated `test.containerfile` to rhoai-3.4

Update base image in test.containerfile to use upstream Red Hat UBI

fixed type error

addressed comments
- updated from 0.5.0 -> 0.5.2

fixed mypy
@jrobertboos jrobertboos closed this Mar 9, 2026
@jrobertboos jrobertboos deleted the lcore-1285 branch March 9, 2026 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.