- Iceberg updates
- PyIceberg updates
- Iceberg in the industry
- Blogs from the community
- Iceberg in the news
- Keep up to date on all things iceberg
Iceberg updates
Apache Iceberg 1.3 was released
Spark now supports the Iceberg UUID type as a string type, ensuring compatibility with tables created using Trino. Some additional noteworthy updates are mentioned below:
- Flink version 1.17 support was added
- Spark version 3.4 support was added
- Procedure to rewrite positional delete files (Thanks, Szehon!)
- Mitigated FileIO closing problems (Thanks, Eduard!)
- JDK 17 support was added
- Support was dropped for:
- Flink version 1.14
- Spark version 2.4
PyIceberg updates
Exciting time for PyIceberg. We’re wrapping up the PyIceberg 0.4.0 release, which brings:
- Support for converting Parquet schemas into Iceberg ones
- Support for reading data using FSSpec.
- Support fetching a limited number of rows to quickly peek into a dataset.
- Improved performance with PyArrow>=12.0.0.
- Improve query performance by adding filter pushdown using column range metadata.
- Ability to do SQL style filters
row_filter='passengers >= 3'.
- SigV4 support for the REST catalog.
- A complete makeover of the docs site.
- And many bugs have been fixed!
More information can be found on the project site, and the package is available on PyPI.
Iceberg in the industry
- IBM supports Iceberg as part of WatsonX AI initiative
- Starburst launches a new Iceberg page
- Starburst adds support for Tabular
Blogs from the community
- Tabular – Securing the Data Lake – Part II
- Starburst – How to migrate your Hive tables to Apache Iceberg
- Starburst – Tutorial: Using Starburst Galaxy’s materialized views with Apache Iceberg
- Tabular – Tutorial: Using Trino and Iceberg for data warehousing
- Anuj Syal – Top 5 New Data Engineering Technologies to Learn in 2023
- Marin Aglić – Learning Apache Iceberg — Storing the Catalog to Postgres
- Amazon – Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes
Iceberg in the news
- Oracle: Oracle Autonomous Data Warehouse Breaks Through the Limitations of Data Management
- The Register: Trino and dbt open source data tools snuggle closer with integrated SaaS
- CXOtoday: Cloudera Recognized as a Leader in 2023 GigaOm Radar for Data Lakes and Lakehouses
- datanami: The Semantic Layer Architecture: Where Business Intelligence is Truly Heading
- datanami: HPE Brings Analytics Together on its Data Fabric
Keep up to date on all things iceberg
Watch for new blog posts added to the Blogs page
See the community Contribute guide to learn how to start contributing to Iceberg
Join the Apache Iceberg workspace on Slack using the invite link
Subscribe to the Apache Iceberg mailing list