-
Connecting Amazon EMR Spark to an Apache Iceberg catalog
GETTING STARTED Amazon EMR is an easy way to deploy distributed data processing frameworks like Apache Spark, Apache Flink, Apache…
-
Configuring Python
GETTING STARTED PyIceberg is a native Python implementation of Apache Iceberg that enables access to a wide range of scientific and…
-
Configuring Trino
GETTING STARTED Trino is a popular open-source distributed SQL query engine that federates queries against data stored in the Hive Metastore,…
-
Connecting to a REST Catalog
GETTING STARTED The Apache Iceberg REST catalog protocol is a standard API for interacting with any Iceberg catalog. The REST…
-
Data engineering with Apache Iceberg
DATA ENGINEERING Data engineers starting at Netflix attend (or used to, at least) a few hours of orientation to become…
-
Using Hidden Partitioning
DATA ENGINEERING This recipe shows how to use Apache Iceberg’s hidden partitioning to improve query performance while avoiding data quality…
-
Setting table write order
DATA ENGINEERING This recipe shows you how to set a table’s write order to instruct all writers — including background…
-
Using MERGE
DATA ENGINEERING Using MERGE One of the most useful tools that Iceberg enables is the SQL MERGE command. This recipe…
-
Creating Branches and Tags
DATA ENGINEERING This recipe shows how to create and manage tags and branches in an Apache Iceberg table. What are…
-
Write – Audit- Publish (WAP) Pattern
DATA ENGINEERING This recipe builds on branching and tagging basics to implement a powerful pattern commonly referred to as Write – Audit…
-
Incremental processing
DATA ENGINEERING Incremental processing is a tried and true approach to improving data transformation performance and reducing cost. The basic…
-
How To Install
PYICEBERG Apache Iceberg is language- and engine-agnostic, meaning it was designed to be portable so that any language or engine…
-
Getting started with the Python API
PYICEBERG This recipe introduces the PyIceberg API. Before running code examples, refer to the PyIceberg catalog configuration recipe for how…
-
Work with data in Pandas
PYICEBERG This recipe shows how to fetch data from an Iceberg table with pure Python into a Pandas dataframe. Reading…
-
Using Apache Iceberg with Polars
PYICEBERG Polars is an extremely fast DataFrame library and in-memory query engine. It includes parallel execution, cache-efficient algorithms, and an expressive…