Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
154 changes: 154 additions & 0 deletions docs/openapi.json
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,58 @@
}
}
},
"/v1/rags/{rag_id}": {
"get": {
"tags": [
"rags"
],
"summary": "Get Rag Endpoint Handler",
"description": "Retrieve a single RAG by its unique ID.\n\nRaises:\n HTTPException:\n - 404 if RAG with the given ID is not found,\n - 500 if unable to connect to Llama Stack,\n - 500 for any unexpected retrieval errors.\n\nReturns:\n RAGResponse: A single RAG's details",
"operationId": "get_rag_endpoint_handler_v1_rags__rag_id__get",
"parameters": [
{
"name": "rag_id",
"in": "path",
"required": true,
"schema": {
"type": "string",
"title": "Rag Id"
}
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/RAGInfoResponse"
}
}
}
},
"404": {
"response": "RAG with given id not found",
"description": "Not Found"
},
"500": {
"response": "Unable to retrieve list of RAGs",
"cause": "Connection to Llama Stack is broken",
"description": "Internal Server Error"
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/v1/query": {
"post": {
"tags": [
Expand Down Expand Up @@ -3918,6 +3970,108 @@
"title": "RAGChunk",
"description": "Model representing a RAG chunk used in the response."
},
"RAGInfoResponse": {
"properties": {
"id": {
"type": "string",
"title": "Id",
"description": "Vector DB unique ID",
"examples": [
"vs_00000000_0000_0000"
]
},
"name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Name",
"description": "Human readable vector DB name",
"examples": [
"Faiss Store with Knowledge base"
]
},
"created_at": {
"type": "integer",
"title": "Created At",
"description": "When the vector store was created, represented as Unix time",
"examples": [
1763391371
]
},
"last_active_at": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"title": "Last Active At",
"description": "When the vector store was last active, represented as Unix time",
"examples": [
1763391371
]
},
"usage_bytes": {
"type": "integer",
"title": "Usage Bytes",
"description": "Storage byte(s) used by this vector DB",
"examples": [
0
]
},
"expires_at": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"title": "Expires At",
"description": "When the vector store expires, represented as Unix time",
"examples": [
1763391371
]
},
"object": {
"type": "string",
"title": "Object",
"description": "Object type",
"examples": [
"vector_store"
]
},
"status": {
"type": "string",
"title": "Status",
"description": "Vector DB status",
"examples": [
"completed"
]
}
},
"type": "object",
"required": [
"id",
"name",
"created_at",
"last_active_at",
"usage_bytes",
"expires_at",
"object",
"status"
],
"title": "RAGInfoResponse",
"description": "Model representing a response with information about RAG DB."
},
"RAGListResponse": {
"properties": {
"rags": {
Expand Down
50 changes: 50 additions & 0 deletions docs/openapi.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,38 @@ Returns:
|-------------|-------------|-----------|
| 200 | Successful Response | [RAGListResponse](#raglistresponse) |
| 500 | Connection to Llama Stack is broken | |
## GET `/v1/rags/{rag_id}`

> **Get Rag Endpoint Handler**

Retrieve a single RAG by its unique ID.

Raises:
HTTPException:
- 404 if RAG with the given ID is not found,
- 500 if unable to connect to Llama Stack,
- 500 for any unexpected retrieval errors.

Returns:
RAGResponse: A single RAG's details



### 🔗 Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| rag_id | string | True | |


### ✅ Responses

| Status Code | Description | Component |
|-------------|-------------|-----------|
| 200 | Successful Response | [RAGInfoResponse](#raginforesponse) |
| 404 | Not Found | |
| 500 | Internal Server Error | |
| 422 | Validation Error | [HTTPValidationError](#httpvalidationerror) |
## POST `/v1/query`

> **Query Endpoint Handler**
Expand Down Expand Up @@ -1688,6 +1720,24 @@ Model representing a RAG chunk used in the response.
| score | | Relevance score |


## RAGInfoResponse


Model representing a response with information about RAG DB.


| Field | Type | Description |
|-------|------|-------------|
| id | string | Vector DB unique ID |
| name | | Human readable vector DB name |
| created_at | integer | When the vector store was created, represented as Unix time |
| last_active_at | | When the vector store was last active, represented as Unix time |
| usage_bytes | integer | Storage byte(s) used by this vector DB |
| expires_at | | When the vector store expires, represented as Unix time |
| object | string | Object type |
| status | string | Vector DB status |


## RAGListResponse


Expand Down
50 changes: 50 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,38 @@ Returns:
|-------------|-------------|-----------|
| 200 | Successful Response | [RAGListResponse](#raglistresponse) |
| 500 | Connection to Llama Stack is broken | |
## GET `/v1/rags/{rag_id}`

> **Get Rag Endpoint Handler**

Retrieve a single RAG by its unique ID.

Raises:
HTTPException:
- 404 if RAG with the given ID is not found,
- 500 if unable to connect to Llama Stack,
- 500 for any unexpected retrieval errors.

Returns:
RAGResponse: A single RAG's details



### 🔗 Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| rag_id | string | True | |


### ✅ Responses

| Status Code | Description | Component |
|-------------|-------------|-----------|
| 200 | Successful Response | [RAGInfoResponse](#raginforesponse) |
| 404 | Not Found | |
| 500 | Internal Server Error | |
| 422 | Validation Error | [HTTPValidationError](#httpvalidationerror) |
## POST `/v1/query`

> **Query Endpoint Handler**
Expand Down Expand Up @@ -1688,6 +1720,24 @@ Model representing a RAG chunk used in the response.
| score | | Relevance score |


## RAGInfoResponse


Model representing a response with information about RAG DB.


| Field | Type | Description |
|-------|------|-------------|
| id | string | Vector DB unique ID |
| name | | Human readable vector DB name |
| created_at | integer | When the vector store was created, represented as Unix time |
| last_active_at | | When the vector store was last active, represented as Unix time |
| usage_bytes | integer | Storage byte(s) used by this vector DB |
| expires_at | | When the vector store expires, represented as Unix time |
| object | string | Object type |
| status | string | Vector DB status |


## RAGListResponse


Expand Down
80 changes: 79 additions & 1 deletion src/app/endpoints/rags.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
from client import AsyncLlamaStackClientHolder
from configuration import configuration
from models.config import Action
from models.responses import RAGListResponse
from models.responses import RAGListResponse, RAGInfoResponse
from utils.endpoints import check_configuration_loaded

logger = logging.getLogger(__name__)
Expand All @@ -31,6 +31,15 @@
500: {"description": "Connection to Llama Stack is broken"},
}

rag_responses: dict[int | str, dict[str, Any]] = {
200: {},
404: {"response": "RAG with given id not found"},
500: {
"response": "Unable to retrieve list of RAGs",
"cause": "Connection to Llama Stack is broken",
},
}
Comment on lines +34 to +41
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

404 response declared but not implemented.

The rag_responses dict declares a 404 response for when a RAG is not found, but the endpoint implementation (lines 137-174) doesn't have any code path that would raise a 404 error. All exceptions are currently treated as 500 errors.

Verify how the Llama Stack client behaves when retrieving a non-existent RAG:


🏁 Script executed:

#!/bin/bash
# Search for how vector_stores.retrieve handles not-found cases
rg -nP -C5 'vector_stores\.retrieve' --type=py

# Look for exception types from llama_stack_client
rg -nP -C3 'class.*Exception|class.*Error' --type=py -g 'llama_stack*'

# Check if there are existing patterns for handling 404s
rg -nP -C3 'HTTP_404_NOT_FOUND|status\.HTTP_404' --type=py

Length of output: 13258


🏁 Script executed:

#!/bin/bash
# Get full context of get_rag_endpoint_handler function
rg -nP -A40 'async def get_rag_endpoint_handler' --type=py src/app/endpoints/rags.py

# Look for any except clauses in the handler
rg -nP -A5 'except.*:' --type=py src/app/endpoints/rags.py | head -50

# Search for llama_stack imports and exceptions
rg -nP 'from.*llama_stack|import.*llama_stack' --type=py | head -20

# Check for NotFoundError or similar exceptions from llama_stack_client
rg -nP 'NotFound|not.found' --type=py -i | grep -i 'exception\|error\|raise' | head -20

Length of output: 4252



🏁 Script executed:

#!/bin/bash
# Find rags.py file
fd -t f 'rags\.py' --type f

# Once found, get full endpoint handler
fd -t f 'rags\.py' -x cat {} | head -200

Length of output: 6471


404 response declared but not implemented.

The rag_responses dict declares a 404 response for when a RAG is not found, but the endpoint implementation (lines 136-174) catches all exceptions generically and returns 500 errors. There's no separate handling for NotFoundError from the llama_stack_client library.

Compare to conversations.py which properly catches NotFoundError and raises HTTP_404_NOT_FOUND. The get_rag_endpoint_handler should:

  1. Import NotFoundError from llama_stack_client
  2. Add an explicit except NotFoundError handler before the generic Exception handler
  3. Raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, ...) with the appropriate detail

Currently, if vector_stores.retrieve(rag_id) raises NotFoundError for a non-existent RAG, it will be caught by the generic except Exception block and returned as a 500 error, not 404.

🤖 Prompt for AI Agents
In src/app/endpoints/rags.py around lines 34 and 136-174, the rag_responses dict
declares a 404 but the handler catches all exceptions and returns 500; import
NotFoundError from llama_stack_client, add an explicit except NotFoundError
block before the generic Exception handler in get_rag_endpoint_handler, and
inside that block raise HTTPException(status_code=status.HTTP_404_NOT_FOUND,
detail="RAG with given id not found") (or similar message consistent with
rag_responses) so missing RAGs return 404 instead of 500.



@router.get("/rags", responses=rags_responses)
@authorize(Action.LIST_RAGS)
Expand Down Expand Up @@ -94,3 +103,72 @@ async def rags_endpoint_handler(
"cause": str(e),
},
) from e


@router.get("/rags/{rag_id}", responses=rag_responses)
@authorize(Action.GET_RAG)
async def get_rag_endpoint_handler(
request: Request,
rag_id: str,
auth: Annotated[AuthTuple, Depends(get_auth_dependency())],
) -> RAGInfoResponse:
"""Retrieve a single RAG by its unique ID.

Raises:
HTTPException:
- 404 if RAG with the given ID is not found,
- 500 if unable to connect to Llama Stack,
- 500 for any unexpected retrieval errors.

Returns:
RAGInfoResponse: A single RAG's details
"""
# Used only by the middleware
_ = auth

# Nothing interesting in the request
_ = request

check_configuration_loaded(configuration)

llama_stack_configuration = configuration.llama_stack_configuration
logger.info("Llama stack config: %s", llama_stack_configuration)

try:
# try to get Llama Stack client
client = AsyncLlamaStackClientHolder().get_client()
# retrieve info about RAG
rag_info = await client.vector_stores.retrieve(rag_id)
return RAGInfoResponse(
id=rag_info.id,
name=rag_info.name,
created_at=rag_info.created_at,
last_active_at=rag_info.last_active_at,
expires_at=rag_info.expires_at,
object=rag_info.object,
status=rag_info.status,
usage_bytes=rag_info.usage_bytes,
)

# connection to Llama Stack server
except HTTPException:
raise
Comment on lines +154 to +155
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Clarify the purpose of catching HTTPException.

The code catches and re-raises HTTPException, but it's unclear what would raise it within the try block. The check_configuration_loaded() call at line 132 is outside the try block, and client.vector_stores.retrieve() likely raises llama_stack_client exceptions.

If this is defensive coding for future changes, consider adding a comment. Otherwise, this block can be removed to simplify the code.

Verify what exceptions vector_stores.retrieve() can raise:


🏁 Script executed:

#!/bin/bash
# Find the vector_stores.retrieve implementation or documentation
ast-grep --pattern $'class VectorStores {
  $$$
  retrieve($$$) {
    $$$
  }
  $$$
}'

# Search for exception handling patterns with vector_stores
rg -nP -C5 'vector_stores\.(retrieve|list)' --type=py

Length of output: 14524



---

🏁 Script executed:

```shell
#!/bin/bash
# Get full context of the get_rag_info() function including all exception handlers
sed -n '130,160p' src/app/endpoints/rags.py

# Also get the full list_rags() function for comparison
sed -n '70,90p' src/app/endpoints/rags.py

Length of output: 1898



---

🏁 Script executed:

```shell
#!/bin/bash
# Find AsyncLlamaStackClientHolder implementation
rg -n "class AsyncLlamaStackClientHolder" --type=py -A 10

# Also search for .get_client() method
rg -n "def get_client" --type=py -B 2 -A 5

Length of output: 1203


Remove the unreachable HTTPException handler at lines 154-155.

The except HTTPException: raise block is dead code. The try block only calls get_client() (raises RuntimeError) and vector_stores.retrieve() (raises APIConnectionError), neither of which produces HTTPException. The HTTPException is only created within the APIConnectionError handler itself, which is outside the try block scope. The list_rags() function confirms this pattern works without the HTTPException handler.

🤖 Prompt for AI Agents
In src/app/endpoints/rags.py around lines 154-155, remove the unreachable
"except HTTPException: raise" handler; the try block cannot raise HTTPException
so delete those two lines and adjust surrounding indentation/flow accordingly so
the RuntimeError and APIConnectionError handlers remain intact and behavior is
unchanged.

except APIConnectionError as e:
logger.error("Unable to connect to Llama Stack: %s", e)
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail={
"response": "Unable to connect to Llama Stack",
"cause": str(e),
},
) from e
# any other exception that can occur during model listing
except Exception as e:
logger.error("Unable to retrieve info about RAG: %s", e)
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail={
"response": "Unable to retrieve info about RAG",
"cause": str(e),
},
) from e
Loading
Loading