Apache Iceberg – Page 3

Creating a Table from CSV

December 28, 2023

Categories: Apache Iceberg, How to

BASICS This recipe demonstrates how to create Apache Iceberg tables from CSV files. This focuses on ensuring the schema for…
READ MORE
Creating a Table from Parquet

December 28, 2023

Categories: Apache Iceberg, How to

BASICS This recipe demonstrates how to create Apache Iceberg tables from Parquet. This focuses on ensuring the schema for the…
READ MORE
Time Travel Queries

December 28, 2023

Categories: Apache Iceberg, How to

BASICS This recipe demonstrates ways to query historical snapshots of Apache Iceberg tables. Time travel to query historical snapshots in…
READ MORE
Querying Table Metadata

December 28, 2023

Categories: Apache Iceberg, How to

BASICS This recipe shows how to inspect Apache Iceberg table metadata with SQL queries. Iceberg metadata Iceberg table metadata is…
READ MORE
Querying an Iceberg Table

December 28, 2023

Categories: Apache Iceberg, How to

BASICS This recipe demonstrates simple queries with Iceberg tables. Running queries in Apache Spark Spark supports two interfaces to query…
READ MORE
Connecting to Athena PySpark

December 28, 2023

Categories: Apache Iceberg, How to

GETTINGS STARTED Amazon Athena is a managed compute service that allows you to use SQL or PySpark to query data…
READ MORE
Connecting Amazon EMR Spark to an Apache Iceberg catalog

December 28, 2023

Categories: Apache Iceberg, How to

GETTING STARTED Amazon EMR is an easy way to deploy distributed data processing frameworks like Apache Spark, Apache Flink, Apache…
READ MORE
Configuring Trino

December 28, 2023

Categories: Apache Iceberg, How to

GETTING STARTED Trino is a popular open-source distributed SQL query engine that federates queries against data stored in the Hive Metastore,…
READ MORE
Configuring Apache Spark

December 28, 2023

Categories: Apache Iceberg, Education

GETTING STARTED Apache Spark provides comprehensive support for Apache Iceberg via both extended SQL syntax and stored procedures to manage…
READ MORE
Connecting to a REST Catalog

December 28, 2023

Categories: Apache Iceberg, How to

GETTING STARTED The Apache Iceberg REST catalog protocol is a standard API for interacting with any Iceberg catalog. The REST…
READ MORE
Catalogs and the REST catalog

December 28, 2023

Categories: Apache Iceberg, Education

GETTING STARTED Catalogs in Apache Iceberg The core responsibility of Iceberg is to manage a collection of files as a…
READ MORE
Why Apache Iceberg — for data warehouse users

December 28, 2023

Categories: Apache Iceberg, Opinion

Major data warehouse platforms such as Google BigQuery, Snowflake, AWS, and Databricks have all announced support for Apache Iceberg tables. Commercial warehouse engines seldom…
READ MORE
Why Apache Iceberg — for data lake users

December 28, 2023

Categories: Apache Iceberg, Opinion

INTRODUCTION If you have been working in a data lake, you’re probably very familiar with its drawbacks. You’re in luck:…
READ MORE
Data engineering with Apache Iceberg

December 27, 2023

Categories: Apache Iceberg, How to

DATA ENGINEERING Data engineers starting at Netflix attend (or used to, at least) a few hours of orientation to become…
READ MORE
Using Hidden Partitioning

December 27, 2023

Categories: Apache Iceberg, How to

DATA ENGINEERING This recipe shows how to use Apache Iceberg’s hidden partitioning to improve query performance while avoiding data quality…
READ MORE