With Iceberg’s integration into a growing number of compute engines, there are many interfaces with which you can use its various powerful features. This blog post is the first part of a series that covers the underlying Java API available for working with Iceberg tables without an engine.
Whether you’re a developer working on a compute engine, an infrastructure engineer maintaining a production Iceberg warehouse, or a data engineer working with Iceberg tables,
the Iceberg java client provides valuable functionality to enable working with Iceberg tables. The easiest way to try out the java client is to use the interactive notebook
Iceberg - An Introduction to the Iceberg Java API.ipynb, which can be found using the docker-compose provided in one of our earlier blog posts: Docker, Spark, and Iceberg: The Fastest Way
to Try Iceberg!. If you already have the
tabulario/spark-iceberg image cached locally, make sure you pick up the latest changes by running
- The Catalog Interface
- Loading a Catalog
- Defining a Schema and a Partition Spec
- Creating a Table
- Dropping a Table
- What’s Next
The Catalog Interface
A catalog in Iceberg is an inventory of Iceberg namespaces and tables. Iceberg comes with many catalog implementations, such as Hive, Glue, and DynamoDB. It will even include a generic REST-based catalog in an upcoming release. It’s even possible to plug in your own catalog implementation to inject custom logic specific to your use-cases.
For this walkthrough, we will use the
JdbcCatalog that comes with Iceberg. Let’s get started!
Loading a Catalog
To load a catalog, you first have to construct a properties map to configure it. The properties required vary depending on the type of catalog you’re using. We’re using a JdbcCatalog backed by a Postgres database, so our properties map needs to include the Postgres connection information.
Two properties commonly required by all catalogs are the warehouse location and the file-io implementation. We’ll use a local directory as our warehouse location and
as our catalog’s file-io implementation.
Note: To learn more about the file-io abstraction in Iceberg, check out one of our earlier blog posts that provides an excellent overview: Iceberg FileIO: Cloud Native Tables.
Let’s go ahead and generate a map of catalog properties to configure our JdbcCatalog.
import org.apache.iceberg.CatalogProperties; import org.apache.iceberg.jdbc.JdbcCatalog; import org.apache.iceberg.hadoop.HadoopFileIO; Map<String, String> properties = new HashMap<>(); properties.put(CatalogProperties.CATALOG_IMPL, JdbcCatalog.class.getName()); properties.put(CatalogProperties.URI, "jdbc:postgresql://postgres:5432/demo_catalog"); properties.put(JdbcCatalog.PROPERTY_PREFIX + "user", "admin"); properties.put(JdbcCatalog.PROPERTY_PREFIX + "password", "password"); properties.put(CatalogProperties.WAREHOUSE_LOCATION, "/home/iceberg/warehouse"); properties.put(CatalogProperties.FILE_IO_IMPL, HadoopFileIO.class.getName());
Next, let’s initialize the catalog, setting a name for it and passing it the properties map containing our configuration.
JdbcCatalog catalog = new JdbcCatalog(); catalog.initialize("demo", properties);
That’s it! We now have a catalog instance that includes operations such as listing, creating, renaming, and dropping tables.
Defining a Schema and a Partition Spec
In the next section, we’ll create a table, but first, we must define the table’s schema. Let’s create a simple schema with four columns–
import org.apache.iceberg.Schema; import org.apache.iceberg.types.Types; Schema schema = new Schema( Types.NestedField.required(1, "level", Types.StringType.get()), Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()), Types.NestedField.required(3, "message", Types.StringType.get()), Types.NestedField.optional(4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get())) );
Additionally, let’s build a partition spec that defines an hourly partition on the
import org.apache.iceberg.PartitionSpec; PartitionSpec spec = PartitionSpec.builderFor(schema) .hour("event_time") .build();
Creating a Table
Using our schema and partition spec, we can now create our table. We’re going to create a “webapp” namespace and create our table identifier in that namespace.
import org.apache.iceberg.catalog.Namespace; import org.apache.iceberg.catalog.TableIdentifier; Namespace namespace = Namespace.of("webapp"); TableIdentifier name = TableIdentifier.of(namespace, "logs");
Now, let’s create our table!
catalog.createTable(name, schema, spec)
If we call the
listTables method on our catalog, we can see our newly created table in the list.
List<TableIdentifier> tables = catalog.listTables(namespace); System.out.println(tables)
Dropping a Table
As you would expect, the
Catalog interface also includes a method for dropping tables. Let’s use the same table identifier object to drop the table we created in the previous section.
This post is part one of a multi-part series and covered the main entry-point to Iceberg tables–the Catalog. Check back for upcoming posts that will cover a wide range of topics from working with metadata, performing table maintenance, and reading/writing data files! Also, if you’d like to be a part of the growing Iceberg community or just want to stop in and say hello, check out our community page to learn where to find us!