❤️ Apache Iceberg? Spread the word by giving it a ⭐ on the apache/iceberg repo!


Project updates

Iceberg Java

  • Release 1.4.0 is just around the corner! 🙌🏿.
  • Added Spark 3.5 support
    • Row-level implementations of merge, update, and delete have moved to Spark, and all related extensions have been dropped from Iceberg.
    • Passing an advisory partition size, so you can set the target size per table and have automatic coalesce with Adaptive Query Execution. This makes the output tunable to minimize the number of small files requiring compaction based on runtime metrics.
    • This release removed support for Spark 3.1 and are deprecating support for 3.2. This along with the row-level implementations make Spark release upgrades in Iceberg happen faster.
  • Added Spark support for distributed planning
    • Iceberg planning utilizes manifest partition info to quickly plan queries with partition filters, especially when metadata is properly clustered.
    • Distributed planning can enhance performance for specific use cases due to higher cluster parallelism compared to driver cores.
    • Tests on a table with 20 million files showed significant improvements in planning times for various queries, but the cost of delivering results can be a limiting factor.
  • Push down Iceberg functions to Spark V2 filtersIceberg can now push down system functions to reduce the amount of data read from files. For example, if we only want to retrieve data from a single bucket we can use:spark.sql( """ SELECT * FROM my_catalog.db.table WHERE my_catalog.system.bucket(10, id) = 2; """ ) This will also work with other Iceberg partition functions. In addition, we can take advantage of this when calling rewriteDataFiles (rewrite_data_files) and rewritePositionDeleteFiles (rewrite_position_delete_files), like so:spark.sql( """ CALL my_catalog.system.rewrite_data_files( table => 'foo.bar', where => 'my_catalog.system.bucket(4, url) = 0') """ )
  • Added AES GCM Stream encryption and decryptionSupport has been added for AES GCM Stream, which provides data encryption and integrity verification. This is a great step forward in the ongoing effort towards full metadata encryption.
  • Added strict metadata cleanupStrict metadata cleanup provides additional protection against table corruption by only triggering metadata cleanup operations when commits fail due to an exception that implements the CleanableFailure interface.
  • Add Vectorized reads on delete, update, and merge plans
    • Remove restrictions in Arrow and Spark 3.4 logic that only enabled delete reads.
    • Enable delete, update, and merge plans to continue with vectorized execution rather than falling back to row based reads.

PyIceberg, Iceberg-Go, and Iceberg-Rust

Bergy Blogs

Ecosystem Updates

Vendor Updates

Iceberg Resources

🏁 Get Started with Apache Iceberg
👩‍🏫 Learn more about Apache Iceberg on the official Apache site
📺 Watch and subscribe to the Iceberg YouTube Channel
📰 Read up on some community blog posts
🫴🏾 Contribute to Iceberg
👥 SELECT * FROM you JOIN iceberg_community
📬 Subscribe to the Apache Iceberg mailing list