A Tabular newsletter revisiting last month in Iceberg
❤️ Apache Iceberg? Spread the word by giving it a ⭐ on the apache/iceberg repo!
Project updates
Iceberg Java
- Performance improvements for delete files through the removal of outdated logic in
DeleteFilter
and an unnecessary synchronization step inBitmapPositionDeleteIndex
. - Added column stats filtering, giving users the ability to limit the column stats brought back in the
ScanTask
returned from planning, paving a path to many new efficiencies. - Added several more metrics to Spark UI from Iceberg scan metrics.
- The Java 1.4.3 release (milestone) is in progress with work on a potential MERGE consistency bug.
PyIceberg, iceberg-go, and iceberg-rust
- Work on PyIceberg continues with some new table metadata updates added recently.
- Scaffolding for REST-Catalog and load-table API has been added to iceberg-rust
Iceberg at re:Invent
- Lots of exciting Iceberg news from re:Invent 2023, but let’s kick off with this quote from Andy Warfield, AWS VP and Distinguished Engineer:In almost every single conversation that I had with customers that were building data lakes on S3 we talked about Iceberg. Customers were bringing it up.”You can catch that and the rest of the great discussion in this video.
- Redshift also now supports incremental refresh for materialized views on Apache Iceberg tables.
- Starburst announces streaming ingest into Iceberg tables and automatic data optimization of Iceberg tables.
- Tabular releases their all-new Apache Iceberg Cookbook, with more than 30 recipes to help you get started and dig deeper into Iceberg.
‘bergy Blogs
- Getting Started with Apache Spark, Apache Kafka and Apache Iceberg – Joe Stein
- Streaming SQL in Data Mesh – Netflix Technology Blog
- A Paradigm Shift to More Affordable Data Stacks – Julien
- Iceberg and Hudi, the ACID test – Ryan Blue
- Insights into using Iceberg on Databricks – Alex Merced
Ecosystem Updates
- Trino adds support for the register_table and unregister_table procedures with the REST catalog.
- Amazon Redshift announces general availability of support for Apache Iceberg.
- Why the Modern Datalake is Being Built Privately.
- Interoperability Trend in Open Table Formats Effect On Enterprise Data Architectures.
Vendor Updates
- AWS opens new CDC path with support for DynamoDB incremental export to update Iceberg tables.
- Improved Amazon EMR Serverless support in EMR Studio.
- The Battle for Your Metadata: Check out Tom N.’s latest comparison of Starburst, Databricks, and Snowflake.
- Update on Iceberg support in Snowflake… Now in public preview.
- Role-Based Access: Get some helpful advice on RBAC from Randy Pitcher.
- Microsoft announces CDC to bring automatic mirroring to Microsoft Fabric.
Iceberg Resources
🏁 Get Started with Apache Iceberg
👩🏫 Learn more about Apache Iceberg on the official Apache site
📺 Watch and subscribe to the Iceberg YouTube Channel
📰 Read up on some community blog posts
🫴🏾 Contribute to Iceberg
👥 SELECT * FROM you JOIN
iceberg_community
📬 Subscribe to the Apache Iceberg mailing list