Conversation
✅ Deploy Preview for redpanda-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR updates Iceberg and AWS Glue documentation with environment-specific configuration guidance. The main documentation file ( Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modules/reference/partials/properties/object-storage-properties.adoc (1)
1522-1578:⚠️ Potential issue | 🔴 CriticalCritical: Auto-generated file edited directly.
This file is auto-generated and should not be edited manually. Line 1 explicitly states: "This content is autogenerated. Do not edit manually. To override descriptions, use the doc-tools CLI with the --overrides option."
Any changes made directly to this file will be overwritten the next time the properties are regenerated. Based on learnings, files in
/modules/reference/partials/properties/must never be edited directly.To properly add the
redpanda-cloudtags to thecloud_storage_credentials_sourceproperty:
- Use the doc-tools CLI with the appropriate configuration/overrides to add these tags
- Regenerate the properties file
- Alternatively, if tags are needed for conditional includes, verify if they should be added in the source data or through the generation tooling
Based on learnings: "Never directly edit files in
/modules/reference/partials/properties/- they are auto-generated and will be overwritten"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/reference/partials/properties/object-storage-properties.adoc` around lines 1522 - 1578, The auto-generated property block for cloud_storage_credentials_source was edited directly (you added the redpanda-cloud tag) which will be overwritten; revert manual edits and instead add the redpanda-cloud tag in the source/overrides used by the generator: update the property definition for cloud_storage_credentials_source in the generator input (or create an overrides file) and run the doc-tools CLI with the --overrides option to regenerate the object-storage properties so the redpanda-cloud conditional tags are applied; if conditional tagging belongs in the generation tooling, add the tag there and re-run the generation pipeline rather than editing the generated object-storage-properties.adoc file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc`:
- Around line 192-202: The multiline shell command starting with "rpk cluster
config set" contains an inline comment ("# Glue requires Redpanda Iceberg tables
to be manually deleted") on a line that ends with a backslash, which breaks bash
continuation; remove the inline comment from the continued lines and place
explanatory comments on their own lines before or after the command, and ensure
each continued line ends with a backslash followed only by the argument (e.g.,
adjust the line containing "iceberg_delete=false" to remove the "# ..." comment
and move that text into a separate comment line outside the backslash-continued
command).
---
Outside diff comments:
In `@modules/reference/partials/properties/object-storage-properties.adoc`:
- Around line 1522-1578: The auto-generated property block for
cloud_storage_credentials_source was edited directly (you added the
redpanda-cloud tag) which will be overwritten; revert manual edits and instead
add the redpanda-cloud tag in the source/overrides used by the generator: update
the property definition for cloud_storage_credentials_source in the generator
input (or create an overrides file) and run the doc-tools CLI with the
--overrides option to regenerate the object-storage properties so the
redpanda-cloud conditional tags are applied; if conditional tagging belongs in
the generation tooling, add the tag there and re-run the generation pipeline
rather than editing the generated object-storage-properties.adoc file.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 903fad3a-6b53-4f7c-9b76-ea89517b2157
📒 Files selected for processing (3)
modules/manage/pages/iceberg/iceberg-topics-aws-glue.adocmodules/reference/partials/properties/cluster-properties.adocmodules/reference/partials/properties/object-storage-properties.adoc
| For clusters created before March 2026, you must run `rpk byoc apply` to provision the Glue IAM policy before enabling Iceberg. This is a one-time operation that updates the broker role with the necessary Glue permissions. | ||
| endif::[] | ||
|
|
||
| ifndef::env-cloud[] |
There was a problem hiding this comment.
@simon0191 Is this correct -- Cloud users won't have to do anything special for IAM, so the lines that follow this should display for Self-managed only?
| - `<bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<bucket-name>/iceberg`. As a security best practice, Redpanda Data recommends specifying a subfolder (using prefixes) rather than the root of the bucket. | ||
| - `<glue-access-key>`: The AWS access key ID for your Glue service account. | ||
| - `<glue-secret-key-name>`: The name of the secret that stores the AWS secret access key for your Glue service account. To reference a secret in a cluster property, for example `iceberg_rest_catalog_aws_secret_key`, you must first xref:manage:iceberg/use-iceberg-catalogs.adoc#store-a-secret-for-rest-catalog-authentication[store the secret value]. | ||
| - `<bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<bucket-name>/iceberg`. For BYOC clusters, the bucket name is `redpanda-cloud-storage-<cluster-id>`. For BYOVPC clusters, use the name of the bucket you created as a customer-managed resource. |
There was a problem hiding this comment.
@simon0191 Could you confirm that this is OK to add here, and should we update the table in this doc as well?
|
@simon0191 somewhat related, our docs currently say BYOC is a prereq, is it worth now specifying BYOVPC too? |
I don't see why this won't work in BYOVPC, we have some large customers on AWS that could use this feature on BYOVPC. |
|
Awesome to see this coming in. And it should really be tested in BYOVPC but shouldn't really matter since it's just IAM role oriented eh @david-yu ? |
fe3e192 to
d719306
Compare
| * config_ref:iceberg_rest_catalog_aws_access_key,true,properties/cluster-properties[`iceberg_rest_catalog_aws_access_key`] | ||
| * config_ref:iceberg_rest_catalog_aws_secret_key,true,properties/cluster-properties[`iceberg_rest_catalog_aws_secret_key`], added as a secret value (see the <<update-cluster-configuration,next section>> for details) | ||
| * config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`] | ||
| * Allow Redpanda to use the same `cloud_storage_*` credential properties already configured for S3. If you do not configure the overrides listed below, Redpanda uses the same credentials for both S3 and AWS Glue. This is the recommended approach, especially in BYOC deployments where the broker's IAM role already includes the necessary Glue permissions. |
There was a problem hiding this comment.
@kbatuigas these properties are not part of the user experience in RP Cloud. We should not be talking about them by name
There was a problem hiding this comment.
"Rely on the cluster's existing AWS credentials for accessing Glue" (or something like that) is better
There was a problem hiding this comment.
| * config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`] | ||
| * Allow Redpanda to use the same `cloud_storage_*` credential properties already configured for S3. If you do not configure the overrides listed below, Redpanda uses the same credentials for both S3 and AWS Glue. This is the recommended approach, especially in BYOC deployments where the broker's IAM role already includes the necessary Glue permissions. | ||
| * If you want to configure authentication to AWS Glue separately from authentication to S3, there are equivalent credential configuration properties named `iceberg_rest_catalog_aws_*` that override the object storage credentials. These properties only apply to REST catalog authentication, and never to S3 authentication: | ||
| ** config_ref:iceberg_rest_catalog_credentials_source,true,properties/cluster-properties[`iceberg_rest_catalog_credentials_source`] overrides config_ref:cloud_storage_credentials_source,true,properties/cluster-properties[`cloud_storage_credentials_source`]. To use the broker's IAM role, set the property to `aws_instance_metadata`. To use static credentials, set to `config_file`. |
There was a problem hiding this comment.
@kbatuigas this doesn't track for me. This seems duplicate to the above ( the approach where it relies on the clusters existing IAM role), and so the conf would be already set to that. It seems only the static credentials approach is the advanced option here right now? @simon0191 @wdberkeley can you weigh in here?
There was a problem hiding this comment.
| # Because Tiered Storage does not support the use of distinct buckets for Iceberg, | ||
| # always place iceberg_rest_catalog_base_location in the same S3 bucket as cloud_storage_bucket | ||
| # Because Tiered Storage does not support the use of distinct buckets for Iceberg, | ||
| # always place iceberg_rest_catalog_base_location in the same S3 bucket as cloud_storage_bucket |
There was a problem hiding this comment.
@kbatuigas nobody knows what 'cloud_storage_bucket' means in Cloud. We need a pointer to docs with a screenshot of where they find the cluster bucket in the Cloud UI
There was a problem hiding this comment.
@mattschumpert This is for the self-managed example, but I updated the descriptions in the Cloud version of the doc: https://deploy-preview-1633--redpanda-docs-preview.netlify.app/redpanda-cloud/manage/iceberg/iceberg-topics-aws-glue/#update-cluster-configuration If we can confirm whether the bucket name only shows up in the UI after Iceberg is enabled, I'm not sure that pointing them to the UI is helpful in this specific step
| - `<glue-region>`: The AWS region where your Data Catalog is located. The region in the AWS Glue endpoint must match the region specified in either your config_ref:cloud_storage_region,true,properties/cluster-properties[`cloud_storage_region`] or config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`] property. | ||
| - `<bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<bucket-name>/iceberg`. As a security best practice, Redpanda Data recommends specifying a subfolder (using prefixes) rather than the root of the bucket. | ||
| * `<glue-region>`: The AWS region where your Data Catalog is located. The region in the AWS Glue endpoint must match the region specified in either your config_ref:cloud_storage_region,true,properties/cluster-properties[`cloud_storage_region`] or config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`] property. | ||
| * `<bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<bucket-name>/iceberg`. This must be the same bucket used for Tiered Storage (your `cloud_storage_bucket`). You cannot specify a different bucket for Iceberg data. |
There was a problem hiding this comment.
Same comment as above. It's "the bucket provided (in BYOVPC) or created (BYOC) during cluster creation, which can be found HERE in the UI/API"
There was a problem hiding this comment.
@mattschumpert This is for the Self-managed version of the doc. The cloud version looks like this. I'll explain in the placeholder description where the bucket name can be found
There was a problem hiding this comment.
@simon0191 Is the bucket name only exposed in the UI after Iceberg is enabled? Both for BYOC and BYOVPC?
| * `<glue-region>`: The AWS region where your Data Catalog is located. The region in the AWS Glue endpoint must match the region specified in either your config_ref:cloud_storage_region,true,properties/cluster-properties[`cloud_storage_region`] or config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`] property. | ||
| * `<bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<bucket-name>/iceberg`. This must be the same bucket used for Tiered Storage (your `cloud_storage_bucket`). You cannot specify a different bucket for Iceberg data. | ||
| + | ||
| `<warehouse-path>` is a name you choose (such as `iceberg`) as the logical name for the warehouse represented by all Redpanda Iceberg topic data in the cluster. |
There was a problem hiding this comment.
@wdberkeley should we also clarify with a tip 'this path only affects the S3 folder used for table storage and not the catalog name used by Glue.' to select the catalog, you should configure the namespace.
@kbatuigas I assume we added thae custom namespace here too in Cloud docs ala (https://deploy-preview-1622--redpanda-docs-preview.netlify.app/current/manage/iceberg/iceberg-topics-aws-glue/#update-cluster-configuration)
I think it could be confusing as this name really isn't that important (just an implementation detail Glue forces them to set.
There was a problem hiding this comment.
@simon0191 @wdberkeley is iceberg_default_catalog_namespace available yet in Cloud?
@mattschumpert It seems AWS also uses the "warehouse" terminology https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-iceberg.html not sure if we should change it to something else like <subdirectory> (I think the bucket-name and warehouse-path description later says to use a subfolder and not root)
| * `<cluster-storage-bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<cluster-storage-bucket-name>/iceberg`. | ||
| ** Bucket name: For BYOC clusters, the bucket name is `redpanda-cloud-storage-<cluster-id>`. For BYOVPC clusters, use the name of the bucket you created as a customer-managed resource. | ||
| + | ||
| This must be the same bucket used for Tiered Storage. You cannot specify a different bucket for Iceberg data. |
There was a problem hiding this comment.
(now that we have Cloud Topics and are moving terminology a bit): 'For the cluster's Cloud Storage'
@Feediver1 re-arranged Tiered Storage docs a bit I think to talk about 'Cloud Storage'
There was a problem hiding this comment.
@mattschumpert Hm, looks like for Cloud Topics we say "enable cloud storage" and then "configure object storage" https://docs.redpanda.com/current/develop/manage-topics/cloud-topics/#prerequisites.
Our Tiered Storage doc still says set cloud_storage_mode=true to enable Tiered Storage
The Glue prereqs say "object storage configured for your cluster" so for consistency, it might just be easier to also say "This must be the same bucket used for your cluster's object storage"
|
|
||
| ifdef::env-cloud[] | ||
| For BYOC clusters, the bucket name is `redpanda-cloud-storage-<cluster-id>`, where `<cluster-id>` is the ID of your Redpanda cluster. | ||
| For BYOC clusters, the bucket name is `redpanda-cloud-storage-<cluster-id>`, where `<cluster-id>` is the ID of your Redpanda cluster. For BYOVPC clusters, the bucket name is the name you chose when you created the Tiered Storage bucket as a customer-managed resource. |
There was a problem hiding this comment.
Though maybe we should move to 'Cloud Storage bucket'
There was a problem hiding this comment.
Updated to "object storage bucket" for consistency
| | Cloud provider | Bucket or container name | Iceberg table location | ||
|
|
||
| | AWS | ||
| | `redpanda-cloud-storage-<cluster-id>` |
|
|
||
| |=== | ||
|
|
||
| For BYOVPC clusters, the bucket name is the name you chose when you created the Tiered Storage bucket as a customer-managed resource. |
There was a problem hiding this comment.
Cloud storage ? (unless the UI says Tiered Storage then maybe not ?)
There was a problem hiding this comment.
Our UI just displays "Bucket name" but I think I could change this to something more generic like cloud/object storage
| |=== | ||
|
|
||
|
|
||
| // tag::redpanda-cloud[] |
There was a problem hiding this comment.
@simon0191 is this correct? We now also expose cloud_storage_credentials_source in Cloud? Or must Cloud users set iceberg_rest_catalog_aws_credentials_source=aws_instance_metadata in this scenario?
Description
Resolves https://redpandadata.atlassian.net/browse/DOC-1920
Resolves https://redpandadata.atlassian.net/browse/DOC-2064
Review deadline:
Page previews
Cloud:
AWS Glue doc
Object Storage Properties >
cloud_storage_credentials_sourceSelf-managed:
AWS Glue doc
Checks