September 2023 – Iceberg Community News

Tags: developer, Go, PyIceberg, Python, Rust, Spark 3.5

Brian Olsen

October 4, 2023

❤️ Apache Iceberg? Spread the word by giving it a ⭐ on the apache/iceberg repo!

Project updates

Iceberg Java

Release 1.4.0 is just around the corner! 🙌🏿.
Added Spark 3.5 support
- Row-level implementations of merge, update, and delete have moved to Spark, and all related extensions have been dropped from Iceberg.
- Passing an advisory partition size, so you can set the target size per table and have automatic coalesce with Adaptive Query Execution. This makes the output tunable to minimize the number of small files requiring compaction based on runtime metrics.
- This release removed support for Spark 3.1 and are deprecating support for 3.2. This along with the row-level implementations make Spark release upgrades in Iceberg happen faster.
Added Spark support for distributed planning
- Iceberg planning utilizes manifest partition info to quickly plan queries with partition filters, especially when metadata is properly clustered.
- Distributed planning can enhance performance for specific use cases due to higher cluster parallelism compared to driver cores.
- Tests on a table with 20 million files showed significant improvements in planning times for various queries, but the cost of delivering results can be a limiting factor.
Push down Iceberg functions to Spark V2 filtersIceberg can now push down system functions to reduce the amount of data read from files. For example, if we only want to retrieve data from a single bucket we can use:spark.sql( """ SELECT * FROM my_catalog.db.table WHERE my_catalog.system.bucket(10, id) = 2; """ ) This will also work with other Iceberg partition functions. In addition, we can take advantage of this when calling rewriteDataFiles (rewrite_data_files) and rewritePositionDeleteFiles (rewrite_position_delete_files), like so:spark.sql( """ CALL my_catalog.system.rewrite_data_files( table => 'foo.bar', where => 'my_catalog.system.bucket(4, url) = 0') """ )
Added AES GCM Stream encryption and decryptionSupport has been added for AES GCM Stream, which provides data encryption and integrity verification. This is a great step forward in the ongoing effort towards full metadata encryption.
Added strict metadata cleanupStrict metadata cleanup provides additional protection against table corruption by only triggering metadata cleanup operations when commits fail due to an exception that implements the CleanableFailure interface.
Add Vectorized reads on delete, update, and merge plans
- Remove restrictions in Arrow and Spark 3.4 logic that only enabled delete reads.
- Enable delete, update, and merge plans to continue with vectorized execution rather than falling back to row based reads.

PyIceberg, Iceberg-Go, and Iceberg-Rust

Released PyIceberg 0.5.0 🎉
- Support serverless environments (including AWS Lambda)
- Support for schema evolution
- PyArrow HDFS support through PyArrow
- More about PyIceberg
The Iceberg Rust client continues progressing with the addition of FileIO and the Catalog API
- The next frontier is Table interfaces
- Not sure what any of this means? Read FileIO or a Catalog to get a better understanding of these APIs.
The Iceberg Go client added support for partition specs, manifest files, and is now progressing to the Table interface as well.
- What’s a partition spec?

Bergy Blogs

How to Reduce Full Table Scans during Merges in Apache Iceberg and Save Money
Iceberg REST Catalog with Hive Metastore
The Disruptive Nature of Data Lakehouses
Spark + Kyuubi + Iceberg = Lakehouse
How to work with Iceberg Format in AWS Glue
Ryan Blue: Deep Dive into CDC Series:

Ecosystem Updates

Announcing DuckDB 0.9.0 🦆DuckDB launched an experimental Iceberg extension.
- Experimental extension that currently supports basic Iceberg table reads with little optimization
Added 🐻‍❄️ Polars integration based on PyIceberg
- Read the docs on scan_iceberg method
Trino 🐇 adds read support for refs and tags in their 427 release
- That’s right, you can read branches and tags from Trino now
- Stay tuned for creating branches and tags via SQL

Vendor Updates

Iceberg Resources

🏁 Get Started with Apache Iceberg
👩‍🏫 Learn more about Apache Iceberg on the official Apache site
📺 Watch and subscribe to the Iceberg YouTube Channel
📰 Read up on some community blog posts
🫴🏾 Contribute to Iceberg
👥 SELECT * FROM you JOIN iceberg_community
📬 Subscribe to the Apache Iceberg mailing list

September 2023 – Iceberg Community News

Project updates

Iceberg Java

PyIceberg, Iceberg-Go, and Iceberg-Rust

Bergy Blogs

Ecosystem Updates

Vendor Updates

Iceberg Resources

Related Posts

October 2023 – Iceberg Community News

August 2023 – Iceberg Community News

July 2023 – Iceberg Community News