Event streaming into Iceberg tables

Apache Iceberg Kafka Connect sink

Tabular wrote the Apache Iceberg Kafka Connect sink and offers it to the market under open source Apache license V2.0 terms. You can use the sink to stream events into Tabular-managed Iceberg tables. This blog provides some background on the design goals and architectural decisions made for the Iceberg Kafka Connect sink.

Tabular can be used with your self-managed Kafka implementation, Confluent Cloud or AWS Managed Service for Kafka. 

The sink supports the following features and functionality: 

  • Fan out from a single topic to multiple tables
  • Schema evolution
  • Automatic table creation
  • Exactly-once semantics
  • Ingest Debezium CDC events and output to an Iceberg change log table (learn more about the Tabular CDC service)
  • Support for Avro, JSON Schema and Protobuf serialization 
  • Support for other schema using the Confluent Schema Registry for Kafka

For step-by-step details in implementing Kafka Connect, see this blog post.

Apache Flink sink

The Apache Iceberg Flink Sink allows converting between Flink and Iceberg types. It can be used with self-managed Flink or services such as Amazon Managed Service for Apache Flink. The sink has exactly once semantics, and is compatible with Tabular through the REST catalog.

You can incorporate Flink into your ingestion flow to execute in-stream transformations. A common pattern is to use Kafka to ingest data into Flink, perform transformations, and then write the transformed data to Iceberg tables. Flink can also read data from an Iceberg table, both for incremental streaming workloads, which read new data as it is committed, and for batch loading, such as backfilling data. 

This tutorial blog shows you how to load Iceberg tables from a Flink app. These docs cover setting up Iceberg as a Flink sink in Tabular.