October 2023 – Iceberg Community News

November 8, 2023

❤️ Apache Iceberg? Spread the word by giving it a ⭐ on the apache/iceberg repo!

Project updates

Release 1.4.2 is live! Here are some highlights from 1.4.0 to 1.4.2
- 1.4.2 addresses an issue that was identified in 1.4.0. Please upgrade to this latest patch release.
- New tables default to V2 format, as discussed in the dev list.
- Zstandard (zstd) is the default data compression algorithm on new tables.
- FileIO was added that supports Azure Data Lake Storage Gen 2.
- Added REST API for committing changes to multiple tables.
- Spark-specific highlights for release 1.4.1
  - Increase the default advisory partition size for writes and allow users to set it explicitly.
  - Added distributed planning to improve performance in some use cases dramatically.
  - Skip local sort for unordered writes in Spark using fanout writer
Added support for Flink’s Alter Table syntax.
Created foundational view support for metastore which paves the way for iImplementing view support to multiple Hive metastore-based implementations.
The project is now undergoing a documentation refactor. With the previous refactor, multi-versioned documentation was introduced to the Iceberg back in version 0.12.1. It split the versioned docs source and the static site source into two separate repositories. This has created a confusing and cumbersome contribution and release process for documentation. Following a new design, the Iceberg documentation will retain multi-versioning, while avoiding the concerns around having all versions of the source in a single repository.

Python implementation moved from the main apache/iceberg repository to iceberg-python repository.
PyIceberg has been integrated into Polars. Now you can load Polars dataframes directly from Iceberg tables. Learn more in this blog post.
Significant progress was made towards adding write support in Python.
- The latest WIP version of the write support can be found here. It is encouraged to give it a try, and see if it works for you.
- Also, if you’re interested in reviewing, please check out the snapshot logic and the summary generation.
iceberg-go added supports for manifests and a base table implementation
Iceberg-rust is going strong,
- The scaffolding for the REST catalog implementation went in, and the load-table logic is close.

Sippy helps you avoid egress fees while incrementally migrating data from S3 to R2
If you’re attending AWS re:Invent, here are some Iceberg related sessions you won’t want to miss!
- ANT101 | How to build a platform for AI and analytics based on Apache Iceberg – Ryan Blue
- NFX306 | Netflix’s journey to an Apache Iceberg–only data lake – Vaidy Krishnan, Rakesh Veeramacheneni, and Ashwin Kayyoor
- STG313 | Building and optimizing a data lake on Amazon S3 – Ryan Blue, Huey Han, and Oleg Lvovitch
- ANT308 | Build large-scale transactional data lakes with Apache Iceberg on AWS – Aditya Challa, Aneesh Chandra, Dylan Qu, Francis McGregor-Macdonald, and Nishchai JM
- ANT328 | Accessing open table formats for superior data lake analytics – Asser Moustafa, Sreekanth Martha, and Stuti Deshpande
- OPN309 | Building a secure and scalable transactional data lake – Sercan Karaoglu and Ankita Gavali
And while you’re at re:Invent, be sure to visit the following vendor booths to see what they’re doing with Iceberg
- AWS – Athena, EMR, Redshift, Glue, DynamoDB
- 2604 – Snowflake
- 1022 – Databricks
- 1632 – Tabular
- 1151 – Starburst
- 1000 – Confluent
- 1505 – Clickhouse
- 1530 – Dremio

🏁 Get Started with Apache Iceberg
👩‍🏫 Learn more about Apache Iceberg on the official Apache site
📺 Watch and subscribe to the Iceberg YouTube Channel
📰 Read up on some community blog posts
🫴🏾 Contribute to Iceberg
👥 SELECT * FROM you JOIN iceberg_community
📬 Subscribe to the Apache Iceberg mailing list