Closed
Description
Feature Request / Improvement
PyIceberg 0.7.0
The main objective of 0.7.0 is to have partitioned writes (non-exhaustive list :)
- Support for merge-into / upsert: Merge into / Upsert #402
- Support partitioned appends: Support Appends with TimeTransform Partitions #784
- Support partial deletes: Support partial deletes #569
- Support for parallelizing writes: Parallel Table.append #428 Faster ingestion from Parquet #346
- Support parallelized writes: Bin-pack Writes Operation into multiple parquet files, and parallelize writing
WriteTask
s #444
- Support parallelized writes: Bin-pack Writes Operation into multiple parquet files, and parallelize writing
- Support
table_exists
on catalog: check if table exist #406 Addtable_exists
method to the Catalog #507, fixed in Addtable_exists
method to Catalog #512 - Metadata tables: Add metadata tables #511
- Files assigned to @Gowthami03B, PR in Add Files metadata table #614
- Snapshots assigned to @Fokko in Add Snapshots table metadata #524
- History assigned to @ndrluis in Add history inspect table #828
- Metadata log entries @kevinjqliu (issue in [feat request] Add
metadata_log_entries
metadata table #594): Metadata Log Entries metadata table #667 - Manifests @geruh: PR in Add manifests metadata table #717
- Partitions assigned to @syun64 (issue in Support get partition table with filter #24): Add Partitions Metadata Table #603
- References assigned to @geruh in Add Refs metadata table #602
- Entries assigned to @Fokko in Add entries metadata table #551
- Manifest read/write improvements:
- Implement rolling writes: Implement rolling manifest-writers #596 feat: add
RollingManifestWriter
#650 - Caching of manifests: Implement caching of manifest-files #595
- Implement rolling writes: Implement rolling manifest-writers #596 feat: add
- Incremental append scan: Incremental Append Scan #533
PyIceberg 0.8.0
- Table maintenance:
- Snapshot expiration
- Metadata rewrites
- Compaction
- Delete orphan files
- Catalogs:
- Snowflake catalog: Add Snowflake catalog #687
- Nessie catalog: Support Nessie catalog #19
- BigLake catalog: Add support for BigLake Metastore #651
- ORC Support: ORC file format support #20
- Branch Support: Support writing to a branch #306
- Tag Support: Support creating tags #573. PR: Add Partitions Metadata Table #603
- Write with Sort Order Support writing to a table with sort-order #271
- Support deletes with Merge-on-read: [Feat] Support Merge-on-Read mode for Deletes #1078
- Support writes to Bucket Partitioned Tables: Support writes to Bucket Partitioned Tables #1074
PyIceberg 1.0.0
Long-term goals:
- Support Griffe to detect breaking API changes Use griffe to find breaking changes #334: detect breaking changes #394
- Implement Arrow dataset: Expose PyIceberg table as PyArrow Dataset #30
- Support table maintenance operations: Support to optimize, analyze tables and expire snapshots, remove orphan files #31
- Add View support
- Add Puffin support
- Support engine integrations
- DuckDB
- Daft (Iceberg Write support Eventual-Inc/Daft#1877)
- Polars (feat(python): Add
DataFrame.write_iceberg
pola-rs/polars#15018) - Ray
- Support Commit Retries: Support intelligent commit retries #269
Metadata
Metadata
Assignees
Labels
No labels