-
Why Apache Iceberg — for data warehouse users
Major data warehouse platforms such as Google BigQuery, Snowflake, AWS, and Databricks have all announced support for Apache Iceberg tables. Commercial warehouse engines seldom…
-
Why Apache Iceberg — for data lake users
INTRODUCTION If you have been working in a data lake, you’re probably very familiar with its drawbacks. You’re in luck:…
-
Setting table write order
DATA ENGINEERING This recipe shows you how to set a table’s write order to instruct all writers — including background…
-
Using MERGE
DATA ENGINEERING Using MERGE One of the most useful tools that Iceberg enables is the SQL MERGE command. This recipe…
-
Creating Branches and Tags
DATA ENGINEERING This recipe shows how to create and manage tags and branches in an Apache Iceberg table. What are…
-
Write – Audit- Publish (WAP) Pattern
DATA ENGINEERING This recipe builds on branching and tagging basics to implement a powerful pattern commonly referred to as Write – Audit…
-
Incremental processing
DATA ENGINEERING Incremental processing is a tried and true approach to improving data transformation performance and reducing cost. The basic…
-
CDC pipeline from a changelog to create a mirror table
DATA ENGINEERING This recipe shows how to set up a pipeline taking data from an AWS DMS source to an…
-
How To Install
PYICEBERG Apache Iceberg is language- and engine-agnostic, meaning it was designed to be portable so that any language or engine…
-
Getting started with PyIceberg CLI
PYICEBERG The PyIceberg CLI allows you to easily inspect table metadata through Apache Iceberg catalogs. This recipe shows commonly-used commands.…
-
Getting started with the Python API
PYICEBERG This recipe introduces the PyIceberg API. Before running code examples, refer to the PyIceberg catalog configuration recipe for how…
-
Work with data in Pandas
PYICEBERG This recipe shows how to fetch data from an Iceberg table with pure Python into a Pandas dataframe. Reading…
-
Using Apache Iceberg with Polars
PYICEBERG Polars is an extremely fast DataFrame library and in-memory query engine. It includes parallel execution, cache-efficient algorithms, and an expressive…
-
Run local queries in DuckDB
PYICEBERG For DuckDB there are currently two paths for Iceberg integration in PyIceberg. This recipe demonstrates how to use DuckDB…
-
File compaction
DATA OPERATIONS The primary motivation for creating Apache Iceberg was to make transactions safe and reliable. Without safe concurrent writes,…