Iceberg updates
- Flink: 1.17 support was added, 1.14 removed (Liwei Li)
- Iceberg Java 1.2.0 release is out (Jack Ye)
- Added View version and parser (Amogh)
- Improved bit density in object storage layout (Prashant)
- Add initial support for Spark 3.4
Apache Iceberg 1.2.1 was released on April 11th, 2023. The 1.2.1 release is a patch release to address various issues identified in the prior release. Here is an overview:
- CORE
- REST: fix previous locations for refs-only load #7284
- Parse snapshot-id as long in remove-statistics update #7235
- Spark
- Broadcast table instead of file IO in rewrite manifests #7263
- Revert “Spark: Add “Iceberg” prefix to SparkTable name string for SparkUI #7273
- AWS
- Performance improvements for S3 when using the Apache HTTP client #7262
- S3 Credentials provider support in DefaultAwsClientFactory #7066
PyIceberg updates
- Wrapping everything up for the 0.4.0 release that will bring:
- Add support for converting a query into a Ray dataset (thanks Rushan!)
- A revamp of the documentation page (thanks Luigi!)
- Able to limit the number of rows of a query (thanks Daniel!)
- Implemented evaluation of the metrics to speed up queries (thanks Fokko!)
- Ability to convert an Arrow schema to Iceberg, fixes AWS Athena issues (thanks Rushan!)
- Add support for positional deletes (thanks Fokko!)
More information can be found on the project site, and the package is available on PyPI.
Iceberg in the industry
Blogs from the community
- Mayur Choubey – How to create a unified data lake with Tabular in 5 mins
- Mayur Choubey – Building Serverless Data Pipelines with AWS Lambda, PyIceberg, and Tabular
- Mayur Choubey – The Power of Three: Using Apache Iceberg, Databricks, and Tabular for Data Engineering
- Mayur Choubey – Auto Optimizing Apache Iceberg tables with Tabular: Best practices from a DBA standpoint – Part 1
- Kostas Pappas – Migrating to Iceberg for a more efficient Data Lake
- Mike Shakhomirov – Introduction to Apache Iceberg Tables
- Waitingfor{code} – Table file formats – Z-Order compaction: Apache Iceberg
- Trino – Just the right time date predicates with Iceberg
- Sree Vaddi – Quickstart Iceberg with Spark and Docker Compose
- Cloudera – Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs
- Starburst – Improving performance with Iceberg sorted tables
Iceberg in the news
- CXOtoday: Fivetran Supports the Automation of the Modern Data Lake on Amazon S3
- Breaking Latest News: how the open approach to hybrid data is changing
- TFIR: Data Lakehouse: The Wave Of The Future
Keep up to date on all things iceberg
Watch for new blog posts added to the Blogs page
See the community Contribute guide to learn how to start contributing to Iceberg
Join the Apache Iceberg workspace on Slack using the invite link
Subscribe to the Apache Iceberg mailing list