How to – Page 2

Connecting Amazon EMR Spark to an Apache Iceberg catalog

December 28, 2023

Categories: Apache Iceberg, How to

GETTING STARTED Amazon EMR is an easy way to deploy distributed data processing frameworks like Apache Spark, Apache Flink, Apache…
READ MORE
Configuring Python

December 28, 2023

Categories: How to, PyIceberg

GETTING STARTED PyIceberg is a native Python implementation of Apache Iceberg that enables access to a wide range of scientific and…
READ MORE
Configuring Trino

December 28, 2023

Categories: Apache Iceberg, How to

GETTING STARTED Trino is a popular open-source distributed SQL query engine that federates queries against data stored in the Hive Metastore,…
READ MORE
Connecting to a REST Catalog

December 28, 2023

Categories: Apache Iceberg, How to

GETTING STARTED The Apache Iceberg REST catalog protocol is a standard API for interacting with any Iceberg catalog. The REST…
READ MORE
Data engineering with Apache Iceberg

December 27, 2023

Categories: Apache Iceberg, How to

DATA ENGINEERING Data engineers starting at Netflix attend (or used to, at least) a few hours of orientation to become…
READ MORE
Using Hidden Partitioning

December 27, 2023

Categories: Apache Iceberg, How to

DATA ENGINEERING This recipe shows how to use Apache Iceberg’s hidden partitioning to improve query performance while avoiding data quality…
READ MORE
Setting table write order

December 27, 2023

Categories: Apache Iceberg, How to

DATA ENGINEERING This recipe shows you how to set a table’s write order to instruct all writers — including background…
READ MORE
Using MERGE

December 27, 2023

Categories: Apache Iceberg, How to

DATA ENGINEERING Using MERGE One of the most useful tools that Iceberg enables is the SQL MERGE command. This recipe…
READ MORE
Creating Branches and Tags

December 27, 2023

Categories: Apache Iceberg, How to

DATA ENGINEERING This recipe shows how to create and manage tags and branches in an Apache Iceberg table. What are…
READ MORE
Write – Audit- Publish (WAP) Pattern

December 27, 2023

Categories: Apache Iceberg, How to

DATA ENGINEERING This recipe builds on branching and tagging basics to implement a powerful pattern commonly referred to as Write – Audit…
READ MORE
Incremental processing

December 27, 2023

Categories: Apache Iceberg, How to

DATA ENGINEERING Incremental processing is a tried and true approach to improving data transformation performance and reducing cost. The basic…
READ MORE
How To Install

December 27, 2023

Categories: How to, PyIceberg

PYICEBERG Apache Iceberg is language- and engine-agnostic, meaning it was designed to be portable so that any language or engine…
READ MORE
Getting started with the Python API

December 27, 2023

Categories: How to, PyIceberg

PYICEBERG This recipe introduces the PyIceberg API. Before running code examples, refer to the PyIceberg catalog configuration recipe for how…
READ MORE
Work with data in Pandas

December 27, 2023

Categories: How to, PyIceberg

PYICEBERG This recipe shows how to fetch data from an Iceberg table with pure Python into a Pandas dataframe. Reading…
READ MORE
Using Apache Iceberg with Polars

December 27, 2023

Categories: How to, PyIceberg

PYICEBERG Polars is an extremely fast DataFrame library and in-memory query engine. It includes parallel execution, cache-efficient algorithms, and an expressive…
READ MORE