❤️ Apache Iceberg? Spread the word by giving it a ⭐ on the apache/iceberg repo!
Project updates
Iceberg Java
- Release 1.4.2 is live! Here are some highlights from 1.4.0 to 1.4.2
- 1.4.2 addresses an issue that was identified in 1.4.0. Please upgrade to this latest patch release.
- New tables default to V2 format, as discussed in the dev list.
- Zstandard (zstd) is the default data compression algorithm on new tables.
- FileIO was added that supports Azure Data Lake Storage Gen 2.
- Added REST API for committing changes to multiple tables.
- Spark-specific highlights for release 1.4.1
- Increase the default advisory partition size for writes and allow users to set it explicitly.
- Added distributed planning to improve performance in some use cases dramatically.
- Skip local sort for unordered writes in Spark using fanout writer
- Added support for Flink’s Alter Table syntax.
- Created foundational view support for metastore which paves the way for iImplementing view support to multiple Hive metastore-based implementations.
- The project is now undergoing a documentation refactor. With the previous refactor, multi-versioned documentation was introduced to the Iceberg back in version 0.12.1. It split the versioned docs source and the static site source into two separate repositories. This has created a confusing and cumbersome contribution and release process for documentation. Following a new design, the Iceberg documentation will retain multi-versioning, while avoiding the concerns around having all versions of the source in a single repository.
PyIceberg, Iceberg-Go, and Iceberg-Rust
- Python implementation moved from the main apache/iceberg repository to iceberg-python repository.
- PyIceberg has been integrated into Polars. Now you can load Polars dataframes directly from Iceberg tables. Learn more in this blog post.
- Significant progress was made towards adding write support in Python.
- The latest WIP version of the write support can be found here. It is encouraged to give it a try, and see if it works for you.
- Also, if you’re interested in reviewing, please check out the snapshot logic and the summary generation.
- iceberg-go added supports for manifests and a base table implementation
- Iceberg-rust is going strong,
- The scaffolding for the REST catalog implementation went in, and the load-table logic is close.
Bergy Blogs
- Apache Iceberg optimization: Solving the small files problem in Amazon EMR
- Building a Feature Store with Apache Iceberg on AWS
- Partner Integration: Apache Iceberg + StarRocks
- 🚀 Exciting Times in the Big Data World: Apache Iceberg Takes Center Stage! 🚀
- Build a Transactional Data Lake with Apache Iceberg
- Lakehouse coverage at Current 2023
- NPLG 10.5.23: A New Way to Monetize Open Source (MotherDuck)
- Calculating Daily/Monthly Active Users with Spark & Iceberg
- Simplifying Complex Data Merging: Combining Data Sources into a Single Table
- The Art of Efficient Data Lake Organization
- State of data catalogs 2023: The battle for your metadata
Ecosystem Updates
- Distributed Materialized Views: How Airbnb’s Riverbed Processes 2.4 Billion Daily Events
- Trino Gateway has arrived.
- Trino adds support for CREATE OR REPLACE TABLE syntax
- More details can be found in the following issue discussion.
Vendor Updates
- Sippy helps you avoid egress fees while incrementally migrating data from S3 to R2
- If you’re attending AWS re:Invent, here are some Iceberg related sessions you won’t want to miss!
- ANT101 | How to build a platform for AI and analytics based on Apache Iceberg – Ryan Blue
- NFX306 | Netflix’s journey to an Apache Iceberg–only data lake – Vaidy Krishnan, Rakesh Veeramacheneni, and Ashwin Kayyoor
- STG313 | Building and optimizing a data lake on Amazon S3 – Ryan Blue, Huey Han, and Oleg Lvovitch
- ANT308 | Build large-scale transactional data lakes with Apache Iceberg on AWS – Aditya Challa, Aneesh Chandra, Dylan Qu, Francis McGregor-Macdonald, and Nishchai JM
- ANT328 | Accessing open table formats for superior data lake analytics – Asser Moustafa, Sreekanth Martha, and Stuti Deshpande
- OPN309 | Building a secure and scalable transactional data lake – Sercan Karaoglu and Ankita Gavali
- And while you’re at re:Invent, be sure to visit the following vendor booths to see what they’re doing with Iceberg
- AWS – Athena, EMR, Redshift, Glue, DynamoDB
- 2604 – Snowflake
- 1022 – Databricks
- 1632 – Tabular
- 1151 – Starburst
- 1000 – Confluent
- 1505 – Clickhouse
- 1530 – Dremio
Iceberg Resources
🏁 Get Started with Apache Iceberg
👩🏫 Learn more about Apache Iceberg on the official Apache site
📺 Watch and subscribe to the Iceberg YouTube Channel
📰 Read up on some community blog posts
🫴🏾 Contribute to Iceberg
👥 SELECT * FROM you JOIN
iceberg_community
📬 Subscribe to the Apache Iceberg mailing list