- Iceberg Updates
- PyIceberg updates
- Iceberg in the industry
- Blogs from the community
- Iceberg in the news
Iceberg Updates
Apache Iceberg 1.2.0 was released on March 20th, 2023. The 1.2.0 release adds a variety of new features and bug fixes. Here is an overview:
Core
- *Added AES GCM encryption stream spec (#5432)
- *Added support for Delta Lake to Iceberg table conversion (#6449, #6880)
- *Added support for
position_deletes
metadata table (#6365, #6716) - *Added support for scan and commit metrics reporter that is pluggable through catalog (#6404, #6246, #6410)
- *Added support for branch commit for all operations (#4926, #5010)
- *Added
FileIO
support for ORC readers and writers (#6293) - *Updated all actions to leverage bulk delete whenever possible (#6682)
- *Updated snapshot ID definition in Puffin spec to support statistics file reuse (#6272)
- *Added human-readable metrics information in
files
metadata table (#5376)
Spark
- *Added time range query support for changelog table (#6350)
- *Added changelog view procedure for v1 table (#6012)
- *Added support for storage partition joins to improve read and write performance (#6371)
- *Updated default Arrow environment settings to improve read performance (#6550)
- *Added aggregate pushdown support for
min
,max
andcount
to improve read performance (#6622) - *Updated default distribution mode settings to improve write performance (#6828, #6838)
- *Updated DELETE to perform metadata-only update whenever possible to improve write performance (#6899)
- *Improved predicate pushdown support for write operations (#6633)
- *Added support for reading a branch or tag through table identifier and
VERSION AS OF
(a.k.a.FOR SYSTEM_VERSION AS OF
) SQL syntax (#6717, #6575) - *Added support for writing to a branch through identifier or through write-audit-publish (WAP) workflow settings (#6965, #7050)
- *Added DDL SQL extensions to create, replace and drop a branch or tag (#6638, #6637, #6752, #6807)
- *Added UDFs for
years
,months
,days
andhours
transforms (#6207, #6261, #6300, #6339) - Added partition related stats for
add_files
procedure result (#6797)
Flink
- Added support for metadata tables (#6222)
- Added support for read options in Flink source (#5967)
- Added support for reading and writing Avro
GenericRecord
(#6557, #6584) - Added support for reading a branch or tag and write to a branch (#6660, #5029)
- Added throttling support for streaming read (#6299)
- Added support for multiple sinks for the same table in the same job (#6528)
Vendor Integrations
- Added Snowflake catalog integration (#6428)
- Added AWS sigV4 authentication support for REST catalog (#6951)
- Added support for AWS S3 remote signing (#6169, #6835, #7080)
- Updated AWS Glue catalog to skip table version archive by default (#6916)
- Updated AWS Glue catalog to not require a warehouse location (#6586)
Dependencies
- Upgraded ORC to 1.8.1 (#6349)
- Upgraded Jackson to 2.14.1 (#6168)
- Upgraded AWS SDK V2 to 2.20.18 (#7003)
- Upgraded Nessie to 0.50.0 (#6875)
PyIceberg updates
- Added Python support for metrics filtering (Fokko Driesprong)
- Added Python support for startsWith (Luigi)
- Removed Python legacy! (Python community)
More information can be found on the project site, and the installer can be found here
Iceberg in the industry
- Cloudera has integrated Iceberg V1 support
- Trino has added Iceberg improvements in release 409
- iData has Iceberg support in their Pipeline product
- CelerData adds Iceberg integration in V3
- Snowflake Iceberg catalog support is now available
Blogs from the community
- Tabular – Iceberg tags and branches
- Dremio – That’s a Wrap! Highlights from Subsurface LIVE 2023
- Cloudera – Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform
- Amazon – Use Apache Iceberg in a data lake to support incremental data processing
- Memphis – Stateful stream processing with Memphis and Apache Iceberg
Iceberg in the news
- Datanami: Iceberg Data Services Emerge from Tabular, Dremio
- Infoworld: Dremio adds new Apache Iceberg features to its data lakehouse
- The Register: Tabular launches with the promise of a ‘headless’ data warehouse
- The New Stack: Multiple Vendors Make Data and Analytics Ubiquitous
Keep up to date on all things iceberg
- Watch for new blog posts added to the Blogs page
- See the community Contribute guide to learn how to start contributing to Iceberg
- Join the Apache Iceberg workspace on Slack using the invite link
- Subscribe to the Apache Iceberg mailing list