icefield

Iceberg Updates

Apache Iceberg 1.2.0 was released on March 20th, 2023. The 1.2.0 release adds a variety of new features and bug fixes. Here is an overview:

Core

  • *Added AES GCM encryption stream spec (#5432)
  • *Added support for Delta Lake to Iceberg table conversion (#6449#6880)
  • *Added support for position_deletes metadata table (#6365#6716)
  • *Added support for scan and commit metrics reporter that is pluggable through catalog (#6404#6246#6410)
  • *Added support for branch commit for all operations (#4926#5010)
  • *Added FileIO support for ORC readers and writers (#6293)
  • *Updated all actions to leverage bulk delete whenever possible (#6682)
  • *Updated snapshot ID definition in Puffin spec to support statistics file reuse (#6272)
  • *Added human-readable metrics information in files metadata table (#5376)

Spark

  • *Added time range query support for changelog table (#6350)
  • *Added changelog view procedure for v1 table (#6012)
  • *Added support for storage partition joins to improve read and write performance (#6371)
  • *Updated default Arrow environment settings to improve read performance (#6550)
  • *Added aggregate pushdown support for minmax and count to improve read performance (#6622)
  • *Updated default distribution mode settings to improve write performance (#6828#6838)
  • *Updated DELETE to perform metadata-only update whenever possible to improve write performance (#6899)
  • *Improved predicate pushdown support for write operations (#6633)
  • *Added support for reading a branch or tag through table identifier and VERSION AS OF (a.k.a. FOR SYSTEM_VERSION AS OF) SQL syntax (#6717#6575)
  • *Added support for writing to a branch through identifier or through write-audit-publish (WAP) workflow settings (#6965#7050)
  • *Added DDL SQL extensions to create, replace and drop a branch or tag (#6638#6637#6752#6807)
  • *Added UDFs for yearsmonthsdays and hours transforms (#6207#6261, #6300, #6339)
  • Added partition related stats for add_files procedure result (#6797)
  • Added support for metadata tables (#6222)
  • Added support for read options in Flink source (#5967)
  • Added support for reading and writing Avro GenericRecord (#6557#6584)
  • Added support for reading a branch or tag and write to a branch (#6660#5029)
  • Added throttling support for streaming read (#6299)
  • Added support for multiple sinks for the same table in the same job (#6528)

Vendor Integrations

  • Added Snowflake catalog integration (#6428)
  • Added AWS sigV4 authentication support for REST catalog (#6951)
  • Added support for AWS S3 remote signing (#6169#6835#7080)
  • Updated AWS Glue catalog to skip table version archive by default (#6916)
  • Updated AWS Glue catalog to not require a warehouse location (#6586)

Dependencies

  • Upgraded ORC to 1.8.1 (#6349)
  • Upgraded Jackson to 2.14.1 (#6168)
  • Upgraded AWS SDK V2 to 2.20.18 (#7003)
  • Upgraded Nessie to 0.50.0 (#6875)

PyIceberg updates

  • Added Python support for metrics filtering (Fokko Driesprong)
  • Added Python support for startsWith (Luigi)
  • Removed Python legacy! (Python community)

More information can be found on the project site, and the installer can be found here

Iceberg in the industry

  • Cloudera has integrated Iceberg V1 support
  • Trino has added Iceberg improvements in release 409
  • iData has Iceberg support in their Pipeline product
  • CelerData adds Iceberg integration in V3
  • Snowflake Iceberg catalog support is now available

Blogs from the community

Iceberg in the news


Keep up to date on all things iceberg