Skip to content

Iceberg Cloud docs: BYOVPC, IAM auth#1633

Open
kbatuigas wants to merge 14 commits intomainfrom
DOC-1920-iceberg-glue-iam-role-support-in-cloud
Open

Iceberg Cloud docs: BYOVPC, IAM auth#1633
kbatuigas wants to merge 14 commits intomainfrom
DOC-1920-iceberg-glue-iam-role-support-in-cloud

Conversation

@kbatuigas
Copy link
Copy Markdown
Contributor

@kbatuigas kbatuigas commented Mar 25, 2026

Description

  • Add BYOVPC cluster support to Iceberg prerequisites
  • Add IAM role authentication option for AWS Glue (alternative to static keys)
  • Clarify that Redpanda does not manage the catalog service
  • Clarify S3 base location requirements and bucket naming
  • Consolidate duplicated bucket name/table location info into a shared partial
  • Make credential source properties visible on Cloud reference pages

Resolves https://redpandadata.atlassian.net/browse/DOC-1920
Resolves https://redpandadata.atlassian.net/browse/DOC-2064
Review deadline:

Page previews

Cloud:
AWS Glue doc
Object Storage Properties > cloud_storage_credentials_source
Self-managed:
AWS Glue doc

Checks

  • New feature
  • Content gap
  • Support Follow-up
  • Small fix (typos, links, copyedits, etc)

@kbatuigas kbatuigas requested a review from a team as a code owner March 25, 2026 23:04
@netlify
Copy link
Copy Markdown

netlify bot commented Mar 25, 2026

Deploy Preview for redpanda-docs-preview ready!

Name Link
🔨 Latest commit 6a16d0b
🔍 Latest deploy log https://app.netlify.com/projects/redpanda-docs-preview/deploys/69cda70e9451ad00089d905b
😎 Deploy Preview https://deploy-preview-1633--redpanda-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 78d4af12-782c-4126-9132-2b89406759bc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR updates Iceberg and AWS Glue documentation with environment-specific configuration guidance. The main documentation file (iceberg-topics-aws-glue.adoc) adds BYOC-environment setup instructions for IAM access and separates cloud versus non-cloud credential configuration flows using conditional blocks. The credential configuration section is restructured from a fixed list to an either/or approach (reuse existing cloud_storage_* credentials or configure separate REST catalog credentials). Two property reference files (cluster-properties.adoc and object-storage-properties.adoc) receive AsciiDoc tag markers to properly scope content for Redpanda Cloud conditional includes.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

Suggested reviewers

  • paulohtb6
  • simon0191
  • wdberkeley
  • Feediver1
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description includes a clear summary of changes, resolves JIRA tickets, provides page previews with actual URLs, but the checks section is incomplete with no category selected. Complete the Checks section by selecting at least one applicable category (New feature, Content gap, Support Follow-up, or Small fix) to clarify the nature of these documentation updates.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title references BYOC and IAM auth, which are core to the documentation updates about new IAM role authentication for AWS Glue in Redpanda Cloud.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch DOC-1920-iceberg-glue-iam-role-support-in-cloud

Comment @coderabbitai help to get the list of available commands and usage tips.

@kbatuigas kbatuigas requested a review from simon0191 March 25, 2026 23:05
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modules/reference/partials/properties/object-storage-properties.adoc (1)

1522-1578: ⚠️ Potential issue | 🔴 Critical

Critical: Auto-generated file edited directly.

This file is auto-generated and should not be edited manually. Line 1 explicitly states: "This content is autogenerated. Do not edit manually. To override descriptions, use the doc-tools CLI with the --overrides option."

Any changes made directly to this file will be overwritten the next time the properties are regenerated. Based on learnings, files in /modules/reference/partials/properties/ must never be edited directly.

To properly add the redpanda-cloud tags to the cloud_storage_credentials_source property:

  1. Use the doc-tools CLI with the appropriate configuration/overrides to add these tags
  2. Regenerate the properties file
  3. Alternatively, if tags are needed for conditional includes, verify if they should be added in the source data or through the generation tooling

Based on learnings: "Never directly edit files in /modules/reference/partials/properties/ - they are auto-generated and will be overwritten"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/reference/partials/properties/object-storage-properties.adoc` around
lines 1522 - 1578, The auto-generated property block for
cloud_storage_credentials_source was edited directly (you added the
redpanda-cloud tag) which will be overwritten; revert manual edits and instead
add the redpanda-cloud tag in the source/overrides used by the generator: update
the property definition for cloud_storage_credentials_source in the generator
input (or create an overrides file) and run the doc-tools CLI with the
--overrides option to regenerate the object-storage properties so the
redpanda-cloud conditional tags are applied; if conditional tagging belongs in
the generation tooling, add the tag there and re-run the generation pipeline
rather than editing the generated object-storage-properties.adoc file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc`:
- Around line 192-202: The multiline shell command starting with "rpk cluster
config set" contains an inline comment ("# Glue requires Redpanda Iceberg tables
to be manually deleted") on a line that ends with a backslash, which breaks bash
continuation; remove the inline comment from the continued lines and place
explanatory comments on their own lines before or after the command, and ensure
each continued line ends with a backslash followed only by the argument (e.g.,
adjust the line containing "iceberg_delete=false" to remove the "# ..." comment
and move that text into a separate comment line outside the backslash-continued
command).

---

Outside diff comments:
In `@modules/reference/partials/properties/object-storage-properties.adoc`:
- Around line 1522-1578: The auto-generated property block for
cloud_storage_credentials_source was edited directly (you added the
redpanda-cloud tag) which will be overwritten; revert manual edits and instead
add the redpanda-cloud tag in the source/overrides used by the generator: update
the property definition for cloud_storage_credentials_source in the generator
input (or create an overrides file) and run the doc-tools CLI with the
--overrides option to regenerate the object-storage properties so the
redpanda-cloud conditional tags are applied; if conditional tagging belongs in
the generation tooling, add the tag there and re-run the generation pipeline
rather than editing the generated object-storage-properties.adoc file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 903fad3a-6b53-4f7c-9b76-ea89517b2157

📥 Commits

Reviewing files that changed from the base of the PR and between 1b84d6e and edb7168.

📒 Files selected for processing (3)
  • modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc
  • modules/reference/partials/properties/cluster-properties.adoc
  • modules/reference/partials/properties/object-storage-properties.adoc

For clusters created before March 2026, you must run `rpk byoc apply` to provision the Glue IAM policy before enabling Iceberg. This is a one-time operation that updates the broker role with the necessary Glue permissions.
endif::[]

ifndef::env-cloud[]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simon0191 Is this correct -- Cloud users won't have to do anything special for IAM, so the lines that follow this should display for Self-managed only?

- `<bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<bucket-name>/iceberg`. As a security best practice, Redpanda Data recommends specifying a subfolder (using prefixes) rather than the root of the bucket.
- `<glue-access-key>`: The AWS access key ID for your Glue service account.
- `<glue-secret-key-name>`: The name of the secret that stores the AWS secret access key for your Glue service account. To reference a secret in a cluster property, for example `iceberg_rest_catalog_aws_secret_key`, you must first xref:manage:iceberg/use-iceberg-catalogs.adoc#store-a-secret-for-rest-catalog-authentication[store the secret value].
- `<bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<bucket-name>/iceberg`. For BYOC clusters, the bucket name is `redpanda-cloud-storage-<cluster-id>`. For BYOVPC clusters, use the name of the bucket you created as a customer-managed resource.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simon0191 Could you confirm that this is OK to add here, and should we update the table in this doc as well?

@kbatuigas
Copy link
Copy Markdown
Contributor Author

@simon0191 somewhat related, our docs currently say BYOC is a prereq, is it worth now specifying BYOVPC too?

@david-yu
Copy link
Copy Markdown
Contributor

@simon0191 somewhat related, our docs currently say BYOC is a prereq, is it worth now specifying BYOVPC too?

I don't see why this won't work in BYOVPC, we have some large customers on AWS that could use this feature on BYOVPC.

@mattschumpert
Copy link
Copy Markdown

Awesome to see this coming in. And it should really be tested in BYOVPC but shouldn't really matter since it's just IAM role oriented eh @david-yu ?

@kbatuigas kbatuigas force-pushed the DOC-1920-iceberg-glue-iam-role-support-in-cloud branch from fe3e192 to d719306 Compare April 1, 2026 02:41
@kbatuigas kbatuigas changed the title IAM support for Glue in Cloud Iceberg Cloud docs: BYOVPC, IAM auth Apr 1, 2026
@kbatuigas kbatuigas requested a review from wdberkeley April 1, 2026 20:28
* config_ref:iceberg_rest_catalog_aws_access_key,true,properties/cluster-properties[`iceberg_rest_catalog_aws_access_key`]
* config_ref:iceberg_rest_catalog_aws_secret_key,true,properties/cluster-properties[`iceberg_rest_catalog_aws_secret_key`], added as a secret value (see the <<update-cluster-configuration,next section>> for details)
* config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`]
* Allow Redpanda to use the same `cloud_storage_*` credential properties already configured for S3. If you do not configure the overrides listed below, Redpanda uses the same credentials for both S3 and AWS Glue. This is the recommended approach, especially in BYOC deployments where the broker's IAM role already includes the necessary Glue permissions.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbatuigas these properties are not part of the user experience in RP Cloud. We should not be talking about them by name

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Rely on the cluster's existing AWS credentials for accessing Glue" (or something like that) is better

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`]
* Allow Redpanda to use the same `cloud_storage_*` credential properties already configured for S3. If you do not configure the overrides listed below, Redpanda uses the same credentials for both S3 and AWS Glue. This is the recommended approach, especially in BYOC deployments where the broker's IAM role already includes the necessary Glue permissions.
* If you want to configure authentication to AWS Glue separately from authentication to S3, there are equivalent credential configuration properties named `iceberg_rest_catalog_aws_*` that override the object storage credentials. These properties only apply to REST catalog authentication, and never to S3 authentication:
** config_ref:iceberg_rest_catalog_credentials_source,true,properties/cluster-properties[`iceberg_rest_catalog_credentials_source`] overrides config_ref:cloud_storage_credentials_source,true,properties/cluster-properties[`cloud_storage_credentials_source`]. To use the broker's IAM role, set the property to `aws_instance_metadata`. To use static credentials, set to `config_file`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbatuigas this doesn't track for me. This seems duplicate to the above ( the approach where it relies on the clusters existing IAM role), and so the conf would be already set to that. It seems only the static credentials approach is the advanced option here right now? @simon0191 @wdberkeley can you weigh in here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Because Tiered Storage does not support the use of distinct buckets for Iceberg,
# always place iceberg_rest_catalog_base_location in the same S3 bucket as cloud_storage_bucket
# Because Tiered Storage does not support the use of distinct buckets for Iceberg,
# always place iceberg_rest_catalog_base_location in the same S3 bucket as cloud_storage_bucket
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbatuigas nobody knows what 'cloud_storage_bucket' means in Cloud. We need a pointer to docs with a screenshot of where they find the cluster bucket in the Cloud UI

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattschumpert This is for the self-managed example, but I updated the descriptions in the Cloud version of the doc: https://deploy-preview-1633--redpanda-docs-preview.netlify.app/redpanda-cloud/manage/iceberg/iceberg-topics-aws-glue/#update-cluster-configuration If we can confirm whether the bucket name only shows up in the UI after Iceberg is enabled, I'm not sure that pointing them to the UI is helpful in this specific step

- `<glue-region>`: The AWS region where your Data Catalog is located. The region in the AWS Glue endpoint must match the region specified in either your config_ref:cloud_storage_region,true,properties/cluster-properties[`cloud_storage_region`] or config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`] property.
- `<bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<bucket-name>/iceberg`. As a security best practice, Redpanda Data recommends specifying a subfolder (using prefixes) rather than the root of the bucket.
* `<glue-region>`: The AWS region where your Data Catalog is located. The region in the AWS Glue endpoint must match the region specified in either your config_ref:cloud_storage_region,true,properties/cluster-properties[`cloud_storage_region`] or config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`] property.
* `<bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<bucket-name>/iceberg`. This must be the same bucket used for Tiered Storage (your `cloud_storage_bucket`). You cannot specify a different bucket for Iceberg data.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. It's "the bucket provided (in BYOVPC) or created (BYOC) during cluster creation, which can be found HERE in the UI/API"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattschumpert This is for the Self-managed version of the doc. The cloud version looks like this. I'll explain in the placeholder description where the bucket name can be found

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simon0191 Is the bucket name only exposed in the UI after Iceberg is enabled? Both for BYOC and BYOVPC?

* `<glue-region>`: The AWS region where your Data Catalog is located. The region in the AWS Glue endpoint must match the region specified in either your config_ref:cloud_storage_region,true,properties/cluster-properties[`cloud_storage_region`] or config_ref:iceberg_rest_catalog_aws_region,true,properties/cluster-properties[`iceberg_rest_catalog_aws_region`] property.
* `<bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<bucket-name>/iceberg`. This must be the same bucket used for Tiered Storage (your `cloud_storage_bucket`). You cannot specify a different bucket for Iceberg data.
+
`<warehouse-path>` is a name you choose (such as `iceberg`) as the logical name for the warehouse represented by all Redpanda Iceberg topic data in the cluster.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wdberkeley should we also clarify with a tip 'this path only affects the S3 folder used for table storage and not the catalog name used by Glue.' to select the catalog, you should configure the namespace.

@kbatuigas I assume we added thae custom namespace here too in Cloud docs ala (https://deploy-preview-1622--redpanda-docs-preview.netlify.app/current/manage/iceberg/iceberg-topics-aws-glue/#update-cluster-configuration)

I think it could be confusing as this name really isn't that important (just an implementation detail Glue forces them to set.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simon0191 @wdberkeley is iceberg_default_catalog_namespace available yet in Cloud?

@mattschumpert It seems AWS also uses the "warehouse" terminology https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-iceberg.html not sure if we should change it to something else like <subdirectory> (I think the bucket-name and warehouse-path description later says to use a subfolder and not root)

* `<cluster-storage-bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<cluster-storage-bucket-name>/iceberg`.
** Bucket name: For BYOC clusters, the bucket name is `redpanda-cloud-storage-<cluster-id>`. For BYOVPC clusters, use the name of the bucket you created as a customer-managed resource.
+
This must be the same bucket used for Tiered Storage. You cannot specify a different bucket for Iceberg data.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(now that we have Cloud Topics and are moving terminology a bit): 'For the cluster's Cloud Storage'

@Feediver1 re-arranged Tiered Storage docs a bit I think to talk about 'Cloud Storage'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattschumpert Hm, looks like for Cloud Topics we say "enable cloud storage" and then "configure object storage" https://docs.redpanda.com/current/develop/manage-topics/cloud-topics/#prerequisites.

Our Tiered Storage doc still says set cloud_storage_mode=true to enable Tiered Storage

The Glue prereqs say "object storage configured for your cluster" so for consistency, it might just be easier to also say "This must be the same bucket used for your cluster's object storage"


ifdef::env-cloud[]
For BYOC clusters, the bucket name is `redpanda-cloud-storage-<cluster-id>`, where `<cluster-id>` is the ID of your Redpanda cluster.
For BYOC clusters, the bucket name is `redpanda-cloud-storage-<cluster-id>`, where `<cluster-id>` is the ID of your Redpanda cluster. For BYOVPC clusters, the bucket name is the name you chose when you created the Tiered Storage bucket as a customer-managed resource.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though maybe we should move to 'Cloud Storage bucket'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to "object storage bucket" for consistency

| Cloud provider | Bucket or container name | Iceberg table location

| AWS
| `redpanda-cloud-storage-<cluster-id>`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!!


|===

For BYOVPC clusters, the bucket name is the name you chose when you created the Tiered Storage bucket as a customer-managed resource.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloud storage ? (unless the UI says Tiered Storage then maybe not ?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our UI just displays "Bucket name" but I think I could change this to something more generic like cloud/object storage

|===


// tag::redpanda-cloud[]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simon0191 is this correct? We now also expose cloud_storage_credentials_source in Cloud? Or must Cloud users set iceberg_rest_catalog_aws_credentials_source=aws_instance_metadata in this scenario?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants