Apache Iceberg? Spread the word by giving it a
on the apache/iceberg repo!
Project updates
Iceberg Java
- Release 1.4.2 is live! Here are some highlights from 1.4.0 to 1.4.2
- 1.4.2 addresses an issue that was identified in 1.4.0. Please upgrade to this latest patch release.
- New tables default to V2 format, as discussed in the dev list.
- Zstandard (zstd) is the default data compression algorithm on new tables.
- FileIO was added that supports Azure Data Lake Storage Gen 2.
- Added REST API for committing changes to multiple tables.
- Spark-specific highlights for release 1.4.1
- Increase the default advisory partition size for writes and allow users to set it explicitly.
- Added distributed planning to improve performance in some use cases dramatically.
- Skip local sort for unordered writes in Spark using fanout writer
- Added support for Flink’s Alter Table syntax.
- Created foundational view support for metastore which paves the way for iImplementing view support to multiple Hive metastore-based implementations.
- The project is now undergoing a documentation refactor. With the previous refactor, multi-versioned documentation was introduced to the Iceberg back in version 0.12.1. It split the versioned docs source and the static site source into two separate repositories. This has created a confusing and cumbersome contribution and release process for documentation. Following a new design, the Iceberg documentation will retain multi-versioning, while avoiding the concerns around having all versions of the source in a single repository.
PyIceberg, Iceberg-Go, and Iceberg-Rust
- Python implementation moved from the main apache/iceberg repository to iceberg-python repository.
- PyIceberg has been integrated into Polars. Now you can load Polars dataframes directly from Iceberg tables. Learn more in this blog post.
- Significant progress was made towards adding write support in Python.
- The latest WIP version of the write support can be found here. It is encouraged to give it a try, and see if it works for you.
- Also, if you’re interested in reviewing, please check out the snapshot logic and the summary generation.
- iceberg-go added supports for manifests and a base table implementation
- Iceberg-rust is going strong,
- The scaffolding for the REST catalog implementation went in, and the load-table logic is close.
Bergy Blogs
- Apache Iceberg optimization: Solving the small files problem in Amazon EMR
- Building a Feature Store with Apache Iceberg on AWS
- Partner Integration: Apache Iceberg + StarRocks
Exciting Times in the Big Data World: Apache Iceberg Takes Center Stage!
- Build a Transactional Data Lake with Apache Iceberg
- Lakehouse coverage at Current 2023
- NPLG 10.5.23: A New Way to Monetize Open Source (MotherDuck)
- Calculating Daily/Monthly Active Users with Spark & Iceberg
- Simplifying Complex Data Merging: Combining Data Sources into a Single Table
- The Art of Efficient Data Lake Organization
- State of data catalogs 2023: The battle for your metadata
Ecosystem Updates
- Distributed Materialized Views: How Airbnb’s Riverbed Processes 2.4 Billion Daily Events
- Trino Gateway has arrived.
- Trino adds support for CREATE OR REPLACE TABLE syntax
- More details can be found in the following issue discussion.
Vendor Updates
- Sippy helps you avoid egress fees while incrementally migrating data from S3 to R2
- If you’re attending AWS re:Invent, here are some Iceberg related sessions you won’t want to miss!
- ANT101 | How to build a platform for AI and analytics based on Apache Iceberg – Ryan Blue
- NFX306 | Netflix’s journey to an Apache Iceberg–only data lake – Vaidy Krishnan, Rakesh Veeramacheneni, and Ashwin Kayyoor
- STG313 | Building and optimizing a data lake on Amazon S3 – Ryan Blue, Huey Han, and Oleg Lvovitch
- ANT308 | Build large-scale transactional data lakes with Apache Iceberg on AWS – Aditya Challa, Aneesh Chandra, Dylan Qu, Francis McGregor-Macdonald, and Nishchai JM
- ANT328 | Accessing open table formats for superior data lake analytics – Asser Moustafa, Sreekanth Martha, and Stuti Deshpande
- OPN309 | Building a secure and scalable transactional data lake – Sercan Karaoglu and Ankita Gavali
- And while you’re at re:Invent, be sure to visit the following vendor booths to see what they’re doing with Iceberg
- AWS – Athena, EMR, Redshift, Glue, DynamoDB
- 2604 – Snowflake
- 1022 – Databricks
- 1632 – Tabular
- 1151 – Starburst
- 1000 – Confluent
- 1505 – Clickhouse
- 1530 – Dremio
Iceberg Resources
Get Started with Apache Iceberg
Learn more about Apache Iceberg on the official Apache site
Watch and subscribe to the Iceberg YouTube Channel
Read up on some community blog posts
Contribute to Iceberg
SELECT * FROM you JOIN
iceberg_community Subscribe to the Apache Iceberg mailing list