❤️ Apache Iceberg? Spread the word by giving it a ⭐ on the apache/iceberg repo!

Project updates

Iceberg Java

  • Release 1.4.2 is live! Here are some highlights from 1.4.0 to 1.4.2
    • 1.4.2 addresses an issue that was identified in 1.4.0. Please upgrade to this latest patch release.
    • New tables default to V2 format, as discussed in the dev list.
    • Zstandard (zstd) is the default data compression algorithm on new tables.
    • FileIO was added that supports Azure Data Lake Storage Gen 2.
    • Added REST API for committing changes to multiple tables.
    • Spark-specific highlights for release 1.4.1
      • Increase the default advisory partition size for writes and allow users to set it explicitly.
      • Added distributed planning to improve performance in some use cases dramatically.
      • Skip local sort for unordered writes in Spark using fanout writer
  • Added support for Flink’s Alter Table syntax.
  • Created foundational view support for metastore which paves the way for iImplementing view support to multiple Hive metastore-based implementations.
  • The project is now undergoing a documentation refactor. With the previous refactor, multi-versioned documentation was introduced to the Iceberg back in version 0.12.1. It split the versioned docs source and the static site source into two separate repositories. This has created a confusing and cumbersome contribution and release process for documentation. Following a new design, the Iceberg documentation will retain multi-versioning, while avoiding the concerns around having all versions of the source in a single repository.

PyIceberg, Iceberg-Go, and Iceberg-Rust

Bergy Blogs

Ecosystem Updates

Vendor Updates

  • Sippy helps you avoid egress fees while incrementally migrating data from S3 to R2
  • If you’re attending AWS re:Invent, here are some Iceberg related sessions you won’t want to miss!
    • ANT101 | How to build a platform for AI and analytics based on Apache Iceberg – Ryan Blue
    • NFX306 | Netflix’s journey to an Apache Iceberg–only data lake – Vaidy Krishnan, Rakesh Veeramacheneni, and Ashwin Kayyoor
    • STG313 | Building and optimizing a data lake on Amazon S3 – Ryan Blue, Huey Han, and Oleg Lvovitch
    • ANT308 | Build large-scale transactional data lakes with Apache Iceberg on AWS – Aditya Challa, Aneesh Chandra, Dylan Qu, Francis McGregor-Macdonald, and Nishchai JM
    • ANT328 | Accessing open table formats for superior data lake analytics – Asser Moustafa, Sreekanth Martha, and Stuti Deshpande
    • OPN309 | Building a secure and scalable transactional data lake – Sercan Karaoglu and Ankita Gavali
  • And while you’re at re:Invent, be sure to visit the following vendor booths to see what they’re doing with Iceberg
    • AWS – Athena, EMR, Redshift, Glue, DynamoDB
    • 2604 – Snowflake
    • 1022 – Databricks
    • 1632 – Tabular
    • 1151 – Starburst
    • 1000 – Confluent
    • 1505 – Clickhouse
    • 1530 – Dremio

Iceberg Resources

🏁 Get Started with Apache Iceberg
👩‍🏫 Learn more about Apache Iceberg on the official Apache site
📺 Watch and subscribe to the Iceberg YouTube Channel
📰 Read up on some community blog posts
🫴🏾 Contribute to Iceberg
👥 SELECT * FROM you JOIN iceberg_community
📬 Subscribe to the Apache Iceberg mailing list