September 2023 - Iceberg Community News


❤️ Apache Iceberg? Spread the word by giving it a ⭐ on the apache/iceberg repo!

Project updates

Iceberg Java

  • Release 1.4.0 is just around the corner! 🙌🏿.

  • Added Spark 3.5 support

    • Row-level implementations of merge, update, and delete have moved to Spark, and all related extensions have been dropped from Iceberg.
    • Passing an advisory partition size, so you can set the target size per table and have automatic coalesce with Adaptive Query Execution. This makes the output tunable to minimize the number of small files requiring compaction based on runtime metrics.
    • This release removed support for Spark 3.1 and are deprecating support for 3.2. This along with the row-level implementations make Spark release upgrades in Iceberg happen faster.
  • Added Spark support for distributed planning

    • Iceberg planning utilizes manifest partition info to quickly plan queries with partition filters, especially when metadata is properly clustered.
    • Distributed planning can enhance performance for specific use cases due to higher cluster parallelism compared to driver cores.
    • Tests on a table with 20 million files showed significant improvements in planning times for various queries, but the cost of delivering results can be a limiting factor.
  • Push down Iceberg functions to Spark V2 filters

    Iceberg can now push down system functions to reduce the amount of data read from files. For example, if we only want to retrieve data from a single bucket we can use:

    SELECT * FROM my_catalog.db.table WHERE my_catalog.system.bucket(10, id) = 2;

    This will also work with other Iceberg partition functions. In addition, we can take advantage of this when calling rewriteDataFiles (rewrite_data_files) and rewritePositionDeleteFiles (rewrite_position_delete_files), like so:

    CALL my_catalog.system.rewrite_data_files(
        table => '', 
        where => 'my_catalog.system.bucket(4, url) = 0')
  • Added AES GCM Stream encryption and decryption

    Support has been added for AES GCM Stream, which provides data encryption and integrity verification. This is a great step forward in the ongoing effort towards full metadata encryption.

  • Added strict metadata cleanup

    Strict metadata cleanup provides additional protection against table corruption by only triggering metadata cleanup operations when commits fail due to an exception that implements the CleanableFailure interface.

  • Add Vectorized reads on delete, update, and merge plans

    • Remove restrictions in Arrow and Spark 3.4 logic that only enabled delete reads.
    • Enable delete, update, and merge plans to continue with vectorized execution rather than falling back to row based reads.

PyIceberg, Iceberg-Go, and Iceberg-Rust

Bergy Blogs

Ecosystem Updates

Vendor Updates

Iceberg Resources

🏁 Get Started with Apache Iceberg
👩‍🏫 Learn more about Apache Iceberg on the official Apache site
📺 Watch and subscribe to the Iceberg YouTube Channel
📰 Read up on some community blog posts
🫴🏾 Contribute to Iceberg
👥 SELECT * FROM you JOIN iceberg_community
📬 Subscribe to the Apache Iceberg mailing list


Senior Software Engineer, OSS

Improve Apache Iceberg by building new capabilities for Tabular and the community

Senior Software Engineer, Product

Design services and using cloud infrastructure to build a resilient and scalable data platform

Senior UI Engineer

Design and implement Tabular’s user experience, where people will create, monitor, and manage their data platform

Developer Advocate

Build examples to solve real-world challenges, write tutorials that help developers succeed, and be a community liaison

Developer Experience Engineer

Build technical documentation and tutorials, assist in maintaining the release processes, and lower the time to dopamine (TTD) of developers using Apache Iceberg