Skip to content

Commit 243543d

Browse files
Samreaysungwy
authored andcommitted
Updating configuration docs (apache#1292)
* Updating configuration docs * Fixing linting
1 parent bfff10d commit 243543d

File tree

1 file changed

+25
-21
lines changed

1 file changed

+25
-21
lines changed

mkdocs/docs/configuration.md

Lines changed: 25 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,30 @@ hide:
2424

2525
# Configuration
2626

27+
## Setting Configuration Values
28+
29+
There are three ways to pass in configuration:
30+
31+
- Using the `~/.pyiceberg.yaml` configuration file
32+
- Through environment variables
33+
- By passing in credentials through the CLI or the Python API
34+
35+
The configuration file is recommended since that's the easiest way to manage the credentials.
36+
37+
To change the path searched for the `.pyiceberg.yaml`, you can overwrite the `PYICEBERG_HOME` environment variable.
38+
39+
Another option is through environment variables:
40+
41+
```sh
42+
export PYICEBERG_CATALOG__DEFAULT__URI=thrift://localhost:9083
43+
export PYICEBERG_CATALOG__DEFAULT__S3__ACCESS_KEY_ID=username
44+
export PYICEBERG_CATALOG__DEFAULT__S3__SECRET_ACCESS_KEY=password
45+
```
46+
47+
The environment variable picked up by Iceberg starts with `PYICEBERG_` and then follows the yaml structure below, where a double underscore `__` represents a nested field, and the underscore `_` is converted into a dash `-`.
48+
49+
For example, `PYICEBERG_CATALOG__DEFAULT__S3__ACCESS_KEY_ID`, sets `s3.access-key-id` on the `default` catalog.
50+
2751
## Tables
2852

2953
Iceberg tables support table properties to configure table behavior.
@@ -36,7 +60,7 @@ Iceberg tables support table properties to configure table behavior.
3660
| `write.parquet.compression-level` | Integer | null | Parquet compression level for the codec. If not set, it is up to PyIceberg |
3761
| `write.parquet.row-group-limit` | Number of rows | 1048576 | The upper bound of the number of entries within a single row group |
3862
| `write.parquet.page-size-bytes` | Size in bytes | 1MB | Set a target threshold for the approximate encoded size of data pages within a column chunk |
39-
| `write.parquet.page-row-limit` | Number of rows | 20000 | Set a target threshold for the approximate encoded size of data pages within a column chunk |
63+
| `write.parquet.page-row-limit` | Number of rows | 20000 | Set a target threshold for the maximum number of rows within a column chunk |
4064
| `write.parquet.dict-size-bytes` | Size in bytes | 2MB | Set the dictionary page size limit per row group |
4165
| `write.metadata.previous-versions-max` | Integer | 100 | The max number of previous version metadata files to keep before deleting after commit. |
4266

@@ -161,26 +185,6 @@ Alternatively, you can also directly set the catalog implementation:
161185
| type | rest | Type of catalog, one of `rest`, `sql`, `hive`, `glue`, `dymamodb`. Default to `rest` |
162186
| py-catalog-impl | mypackage.mymodule.MyCatalog | Sets the catalog explicitly to an implementation, and will fail explicitly if it can't be loaded |
163187

164-
There are three ways to pass in configuration:
165-
166-
- Using the `~/.pyiceberg.yaml` configuration file
167-
- Through environment variables
168-
- By passing in credentials through the CLI or the Python API
169-
170-
The configuration file is recommended since that's the easiest way to manage the credentials.
171-
172-
Another option is through environment variables:
173-
174-
```sh
175-
export PYICEBERG_CATALOG__DEFAULT__URI=thrift://localhost:9083
176-
export PYICEBERG_CATALOG__DEFAULT__S3__ACCESS_KEY_ID=username
177-
export PYICEBERG_CATALOG__DEFAULT__S3__SECRET_ACCESS_KEY=password
178-
```
179-
180-
The environment variable picked up by Iceberg starts with `PYICEBERG_` and then follows the yaml structure below, where a double underscore `__` represents a nested field, and the underscore `_` is converted into a dash `-`.
181-
182-
For example, `PYICEBERG_CATALOG__DEFAULT__S3__ACCESS_KEY_ID`, sets `s3.access-key-id` on the `default` catalog.
183-
184188
### REST Catalog
185189

186190
```yaml

0 commit comments

Comments
 (0)