Skip to content

LCORE-780: REST API endpoint to return info about selected RAG#810

Merged
tisnik merged 4 commits intolightspeed-core:mainfrom
tisnik:lcore-780-info-about-selected-rag
Nov 19, 2025
Merged

LCORE-780: REST API endpoint to return info about selected RAG#810
tisnik merged 4 commits intolightspeed-core:mainfrom
tisnik:lcore-780-info-about-selected-rag

Conversation

@tisnik
Copy link
Copy Markdown
Contributor

@tisnik tisnik commented Nov 19, 2025

Description

LCORE-780: REST API endpoint to return info about selected RAG

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue #LCORE-780

Summary by CodeRabbit

  • New Features

    • Added endpoint to retrieve detailed info for individual RAG instances by ID (id, name, creation/last-active times, usage, expiry, status) with authorization and improved error handling.
    • Introduced a standardized response schema for RAG details and enhanced retrieval logging.
  • Tests

    • Added unit tests covering successful retrieval and multiple error scenarios (not-found, connection errors, configuration issues).

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 19, 2025

Warning

Rate limit exceeded

@tisnik has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 9 minutes and 47 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between c085b10 and c49dca8.

📒 Files selected for processing (4)
  • docs/openapi.json (2 hunks)
  • docs/openapi.md (2 hunks)
  • docs/output.md (2 hunks)
  • tests/unit/app/endpoints/test_rags.py (2 hunks)

Walkthrough

A new GET endpoint /rags/{rag_id} was added to retrieve a single RAG's details with authorization. A RAGInfoResponse model was introduced. The handler initializes configuration and a Llama Stack client, calls vector_stores.retrieve(rag_id), maps result to RAGInfoResponse, and implements specific error handling and logging.

Changes

Cohort / File(s) Summary
Response Model Definition
src/models/responses.py
Added RAGInfoResponse model with fields: id, name, created_at, last_active_at, usage_bytes, expires_at, object, status.
RAG Retrieval Endpoint
src/app/endpoints/rags.py
Added get_rag_endpoint_handler (GET /rags/{rag_id}) and rag_responses OpenAPI variants. Handler loads config, initializes Llama Stack client, calls vector_stores.retrieve(rag_id), maps result to RAGInfoResponse, logs activity, and handles HTTPException, APIConnectionError, and general exceptions (returning 500 with details).
Tests
tests/unit/app/endpoints/test_rags.py
Added async tests for the new handler: config-not-loaded, 404 propagation, APIConnectionError with detailed response handling, and successful retrieval asserting RAGInfoResponse fields.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Endpoint as GET /rags/{rag_id}
    participant Auth as Auth Dependency
    participant Config as Configuration Loader
    participant Llama as Llama Stack Client
    participant Response as RAGInfoResponse

    Client->>Endpoint: GET /rags/{rag_id} + auth
    Endpoint->>Auth: validate (Depends)
    Auth-->>Endpoint: AuthTuple
    Endpoint->>Config: load()
    Config-->>Endpoint: config
    Endpoint->>Llama: init client (config)
    Endpoint->>Llama: vector_stores.retrieve(rag_id)
    alt success
        Llama-->>Endpoint: rag_info
        Endpoint->>Response: map rag_info -> RAGInfoResponse
        Response-->>Client: 200 OK + body
    else APIConnectionError
        Llama-->>Endpoint: APIConnectionError
        Endpoint-->>Client: 500 Internal Server Error (detailed)
    else other exception
        Llama-->>Endpoint: Exception
        Endpoint-->>Client: 500 Internal Server Error (error + cause)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify RAGInfoResponse field mapping matches actual vector_stores.retrieve response.
  • Confirm APIConnectionError handling and returned payload shape follow project conventions.
  • Check auth dependency injection and rag_responses OpenAPI definitions.
  • Review logging for potential sensitive data exposure.

Possibly related PRs

Suggested reviewers

  • asamal4
  • VladimirKadlec
  • are-ces

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a REST API endpoint to retrieve information about a selected RAG, which aligns with the changeset additions in rags.py, responses.py, and tests.
Docstring Coverage ✅ Passed Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
src/app/endpoints/rags.py (1)

132-135: Consider extracting common configuration loading logic.

The configuration loading and validation pattern (lines 132-135) is duplicated from rags_endpoint_handler (lines 70-74). Similarly, the error handling patterns for Llama Stack connection (lines 156-164) duplicate lines 87-95.

Consider extracting this into a helper function to reduce duplication:

async def get_llama_stack_client() -> AsyncLlamaStackClient:
    """Get initialized Llama Stack client with configuration validation."""
    check_configuration_loaded(configuration)
    llama_stack_configuration = configuration.llama_stack_configuration
    logger.info("Llama stack config: %s", llama_stack_configuration)
    return AsyncLlamaStackClientHolder().get_client()

Then both endpoints can simply call:

try:
    client = await get_llama_stack_client()
    # ... endpoint-specific logic
except APIConnectionError as e:
    # ... common error handling
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7042ac9 and 07c482d.

📒 Files selected for processing (2)
  • src/app/endpoints/rags.py (3 hunks)
  • src/models/responses.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/app/endpoints/rags.py (6)
src/models/responses.py (2)
  • RAGListResponse (133-144)
  • RAGInfoResponse (90-130)
src/authorization/middleware.py (1)
  • authorize (111-122)
src/models/config.py (1)
  • Action (329-377)
src/utils/endpoints.py (1)
  • check_configuration_loaded (112-124)
src/configuration.py (2)
  • configuration (73-77)
  • llama_stack_configuration (87-91)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)

Comment on lines +34 to +41
rag_responses: dict[int | str, dict[str, Any]] = {
200: {},
404: {"response": "RAG with given id not found"},
500: {
"response": "Unable to retrieve list of RAGs",
"cause": "Connection to Llama Stack is broken",
},
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

404 response declared but not implemented.

The rag_responses dict declares a 404 response for when a RAG is not found, but the endpoint implementation (lines 137-174) doesn't have any code path that would raise a 404 error. All exceptions are currently treated as 500 errors.

Verify how the Llama Stack client behaves when retrieving a non-existent RAG:


🏁 Script executed:

#!/bin/bash
# Search for how vector_stores.retrieve handles not-found cases
rg -nP -C5 'vector_stores\.retrieve' --type=py

# Look for exception types from llama_stack_client
rg -nP -C3 'class.*Exception|class.*Error' --type=py -g 'llama_stack*'

# Check if there are existing patterns for handling 404s
rg -nP -C3 'HTTP_404_NOT_FOUND|status\.HTTP_404' --type=py

Length of output: 13258


🏁 Script executed:

#!/bin/bash
# Get full context of get_rag_endpoint_handler function
rg -nP -A40 'async def get_rag_endpoint_handler' --type=py src/app/endpoints/rags.py

# Look for any except clauses in the handler
rg -nP -A5 'except.*:' --type=py src/app/endpoints/rags.py | head -50

# Search for llama_stack imports and exceptions
rg -nP 'from.*llama_stack|import.*llama_stack' --type=py | head -20

# Check for NotFoundError or similar exceptions from llama_stack_client
rg -nP 'NotFound|not.found' --type=py -i | grep -i 'exception\|error\|raise' | head -20

Length of output: 4252



🏁 Script executed:

#!/bin/bash
# Find rags.py file
fd -t f 'rags\.py' --type f

# Once found, get full endpoint handler
fd -t f 'rags\.py' -x cat {} | head -200

Length of output: 6471


404 response declared but not implemented.

The rag_responses dict declares a 404 response for when a RAG is not found, but the endpoint implementation (lines 136-174) catches all exceptions generically and returns 500 errors. There's no separate handling for NotFoundError from the llama_stack_client library.

Compare to conversations.py which properly catches NotFoundError and raises HTTP_404_NOT_FOUND. The get_rag_endpoint_handler should:

  1. Import NotFoundError from llama_stack_client
  2. Add an explicit except NotFoundError handler before the generic Exception handler
  3. Raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, ...) with the appropriate detail

Currently, if vector_stores.retrieve(rag_id) raises NotFoundError for a non-existent RAG, it will be caught by the generic except Exception block and returned as a 500 error, not 404.

🤖 Prompt for AI Agents
In src/app/endpoints/rags.py around lines 34 and 136-174, the rag_responses dict
declares a 404 but the handler catches all exceptions and returns 500; import
NotFoundError from llama_stack_client, add an explicit except NotFoundError
block before the generic Exception handler in get_rag_endpoint_handler, and
inside that block raise HTTPException(status_code=status.HTTP_404_NOT_FOUND,
detail="RAG with given id not found") (or similar message consistent with
rag_responses) so missing RAGs return 404 instead of 500.

Comment on lines +154 to +155
except HTTPException:
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Clarify the purpose of catching HTTPException.

The code catches and re-raises HTTPException, but it's unclear what would raise it within the try block. The check_configuration_loaded() call at line 132 is outside the try block, and client.vector_stores.retrieve() likely raises llama_stack_client exceptions.

If this is defensive coding for future changes, consider adding a comment. Otherwise, this block can be removed to simplify the code.

Verify what exceptions vector_stores.retrieve() can raise:


🏁 Script executed:

#!/bin/bash
# Find the vector_stores.retrieve implementation or documentation
ast-grep --pattern $'class VectorStores {
  $$$
  retrieve($$$) {
    $$$
  }
  $$$
}'

# Search for exception handling patterns with vector_stores
rg -nP -C5 'vector_stores\.(retrieve|list)' --type=py

Length of output: 14524



---

🏁 Script executed:

```shell
#!/bin/bash
# Get full context of the get_rag_info() function including all exception handlers
sed -n '130,160p' src/app/endpoints/rags.py

# Also get the full list_rags() function for comparison
sed -n '70,90p' src/app/endpoints/rags.py

Length of output: 1898



---

🏁 Script executed:

```shell
#!/bin/bash
# Find AsyncLlamaStackClientHolder implementation
rg -n "class AsyncLlamaStackClientHolder" --type=py -A 10

# Also search for .get_client() method
rg -n "def get_client" --type=py -B 2 -A 5

Length of output: 1203


Remove the unreachable HTTPException handler at lines 154-155.

The except HTTPException: raise block is dead code. The try block only calls get_client() (raises RuntimeError) and vector_stores.retrieve() (raises APIConnectionError), neither of which produces HTTPException. The HTTPException is only created within the APIConnectionError handler itself, which is outside the try block scope. The list_rags() function confirms this pattern works without the HTTPException handler.

🤖 Prompt for AI Agents
In src/app/endpoints/rags.py around lines 154-155, remove the unreachable
"except HTTPException: raise" handler; the try block cannot raise HTTPException
so delete those two lines and adjust surrounding indentation/flow accordingly so
the RuntimeError and APIConnectionError handlers remain intact and behavior is
unchanged.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/unit/app/endpoints/test_rags.py (1)

176-197: Align test name and description with the generic “unable to retrieve info about RAG” path.

This test is exercising the catch‑all except Exception branch, which returns HTTP 500 with "Unable to retrieve info about RAG" in the response. The current function name still says _unable_to_retrieve_list, and the docstring mentions a connection failure, which can be confusing for the single‑RAG endpoint.

Consider renaming and adjusting the docstring:

-@pytest.mark.asyncio
-async def test_rag_info_endpoint_unable_to_retrieve_list(mocker: MockerFixture) -> None:
-    """Test that /rags/{rag_id} endpoint raises HTTP 500 if Llama Stack connection fails."""
+@pytest.mark.asyncio
+async def test_rag_info_endpoint_unable_to_retrieve_info(mocker: MockerFixture) -> None:
+    """Test that /rags/{rag_id} endpoint raises HTTP 500 on unexpected retrieval errors."""

The assertions below already match the handler’s actual behavior, so only naming/documentation needs tightening.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 07c482d and 37d8c0c.

📒 Files selected for processing (1)
  • tests/unit/app/endpoints/test_rags.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/app/endpoints/test_rags.py (2)
src/app/endpoints/rags.py (1)
  • get_rag_endpoint_handler (110-174)
src/client.py (1)
  • get_client (49-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (2)
tests/unit/app/endpoints/test_rags.py (2)

10-13: Import of get_rag_endpoint_handler is correct and consistent.

The new import cleanly wires the tests to the single-RAG handler and keeps them colocated with existing /rags endpoint tests. No issues here.


115-239: Solid coverage of all major /rags/{rag_id} handler branches.

These tests nicely exercise configuration-not-loaded, 404 passthrough, connection error, generic failure, and the successful retrieval case, including field-by-field assertions on the RAGInfoResponse. This should give good regression coverage for get_rag_endpoint_handler.

@tisnik tisnik merged commit 6bc8850 into lightspeed-core:main Nov 19, 2025
21 of 23 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Feb 19, 2026
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant