blog

October 2023 - Iceberg Community News

blog-image

❤️ Apache Iceberg? Spread the word by giving it a ⭐ on the apache/iceberg repo!


Project updates

Iceberg Java

  • Release 1.4.2 is live! Here are some highlights from 1.4.0 to 1.4.2
    • 1.4.2 addresses an issue that was identified in 1.4.0. Please upgrade to this latest patch release.
    • New tables default to V2 format, as discussed in the dev list.
    • Zstandard (zstd) is the default data compression algorithm on new tables.
    • FileIO was added that supports Azure Data Lake Storage Gen 2.
    • Added REST API for committing changes to multiple tables.
    • Spark-specific highlights for release 1.4.1
      • Increase the default advisory partition size for writes and allow users to set it explicitly.
      • Added distributed planning to improve performance in some use cases dramatically.
      • Skip local sort for unordered writes in Spark using fanout writer
  • Added support for Flink’s Alter Table syntax.
  • Created foundational view support for metastore which paves the way for iImplementing view support to multiple Hive metastore-based implementations.
  • The project is now undergoing a documentation refactor. With the previous refactor, multi-versioned documentation was introduced to the Iceberg back in version 0.12.1. It split the versioned docs source and the static site source into two separate repositories. This has created a confusing and cumbersome contribution and release process for documentation. Following a new design, the Iceberg documentation will retain multi-versioning, while avoiding the concerns around having all versions of the source in a single repository.

PyIceberg, Iceberg-Go, and Iceberg-Rust

Bergy Blogs

Ecosystem Updates

Vendor Updates

  • Sippy helps you avoid egress fees while incrementally migrating data from S3 to R2
  • If you’re attending AWS re:Invent, here are some Iceberg related sessions you won’t want to miss!
    • ANT101 | How to build a platform for AI and analytics based on Apache Iceberg - Ryan Blue
    • NFX306 | Netflix’s journey to an Apache Iceberg–only data lake - Vaidy Krishnan, Rakesh Veeramacheneni, and Ashwin Kayyoor
    • STG313 | Building and optimizing a data lake on Amazon S3 - Ryan Blue, Huey Han, and Oleg Lvovitch
    • ANT308 | Build large-scale transactional data lakes with Apache Iceberg on AWS - Aditya Challa, Aneesh Chandra, Dylan Qu, Francis McGregor-Macdonald, and Nishchai JM
    • ANT328 | Accessing open table formats for superior data lake analytics - Asser Moustafa, Sreekanth Martha, and Stuti Deshpande
    • OPN309 | Building a secure and scalable transactional data lake - Sercan Karaoglu and Ankita Gavali
  • And while you’re at re:Invent, be sure to visit the following vendor booths to see what they’re doing with Iceberg
    • AWS - Athena, EMR, Redshift, Glue, DynamoDB
    • 2604 - Snowflake
    • 1022 - Databricks
    • 1632 - Tabular
    • 1151 - Starburst
    • 1000 - Confluent
    • 1505 - Clickhouse
    • 1530 - Dremio

Iceberg Resources

🏁 Get Started with Apache Iceberg
👩‍🏫 Learn more about Apache Iceberg on the official Apache site
📺 Watch and subscribe to the Iceberg YouTube Channel
📰 Read up on some community blog posts
🫴🏾 Contribute to Iceberg
👥 SELECT * FROM you JOIN iceberg_community
📬 Subscribe to the Apache Iceberg mailing list

Careers

Senior Software Engineer, OSS

Improve Apache Iceberg by building new capabilities for Tabular and the community

Senior Software Engineer, Product

Design services and using cloud infrastructure to build a resilient and scalable data platform

Senior UI Engineer

Design and implement Tabular’s user experience, where people will create, monitor, and manage their data platform

Developer Advocate

Build examples to solve real-world challenges, write tutorials that help developers succeed, and be a community liaison

Developer Experience Engineer

Build technical documentation and tutorials, assist in maintaining the release processes, and lower the time to dopamine (TTD) of developers using Apache Iceberg