This recipe shows how to use Apache Iceberg’s hidden partitioning to improve query performance while avoiding data quality and usability problems.
What is partitioning?
Partitioning is a data engineering technique for improving query performance by dividing rows into groups that enable efficient filtering. For example, you might store a table of log data partitioned into files by log level (DEBUG, INFO, WARN, and ERROR). That is, each file contains logs with just one level and a file’s metadata keeps track of what level it contains. That makes it easy to find ERROR logs without needing to sift through a much larger volume of INFO or DEBUG messages.
What is hidden partitioning?
Iceberg’s hidden partitioning is an improvement over Hive’s approach that makes partitioning declarative. Readers and writers use the table like any other SQL table and Iceberg handles the rest.
Hive partitions data using special columns where every unique value creates a separate partition. This works well for categorical columns like log level, but can’t directly handle columns with high cardinality like timestamps. In practice, columns are transformed to produce values that are suitable for partitioning. For instance, a log table might be partitioned by a date column,
event_date, derived from the
This leads to several problems:
- Partition values are not validated so it’s easy for writers to make mistakes – for example, using the wrong time zone to produce
event_date, or assuming that all rows in a data file are from the same day.
- Readers need to know about partitioning and take extra steps to take advantage of it. The reader needs to know to filter on both
event_date— and is likely to make the same mistakes that trip up writers. Often, the extra steps are accidentally overlooked.
- Since queries reference the extra partition columns, it is impractical to change partitioning schemes, because it requires rewriting every query to account for the new table layout.
Iceberg solves these problems by hiding partitioning from readers and writers. Iceberg is responsible for producing partition values and for using them at query time.
When you create a table, you can configure how to partition it using expressions, like
day(event_ts). These expressions tell Iceberg how to derive partition values at write time. At read time, Iceberg uses the relationships to automatically convert data filters into partition filters.
Creating partitioned tables
To create a partitioned table in Apache Spark, add the
PARTITIONED BY clause with one or more partition expressions. Here’s the command for the example log table that partitions by the log level and timestamp:
CREATE TABLE logs (
) PARTITIONED BY (
Iceberg has a well-defined set of transformations that can be used in partition expressions. It was important to define the set of transforms in the Iceberg spec because it ensures that tables are portable across languages and frameworks.
The available partition transforms include:
hour– Partitions date/time values by hour, day, and so on
bucket(N, any_col)– Pseudo-randomly divides rows across
truncate(L, string_col)– Truncates a string value to length
truncate(W, numeric_col)– Truncates numbers to width
W; for example
truncate(100, 314)produces 300
Note: Spark uses slightly different names than Iceberg for year, month, day, and hour transforms. This differentiates them from built-in functions with the same name. In Spark, the transforms use the plural form:
Writing with hidden partitioning
To write into the logs table, you only need to provide values for the data columns. This avoids human error, but also simplifies MERGE and INSERT statements that can become large and complex.
INSERT INTO logs VALUES
('ERROR', TIMESTAMP '2023-12-25T10:31:40', 'Failed to ...'),
('INFO', TIMESTAMP '2023-12-25T11:14:51', 'Updating ...'),
('INFO', TIMESTAMP '2023-12-25T11:14:52', 'Loading ...')
Engines such as Spark will automatically redistribute rows across tasks to keep the number of data files small. To learn more about distribution, see the recipe on write order and clustering.
Updating table partitioning
Queries that filter on partition source columns automatically use partition filters to avoid reading unnecessary data and metadata files:
count(*) AS error_count,
CAST(event_ts AS date) AS log_date
WHERE level = 'ERROR' AND
event_ts >= DATE '2023-12-25' AND
event_ts < DATE '2023-12-26'
Not requiring additional filters for a duplicative
event_date column makes it easier for readers to work with Iceberg tables, but there is an additional benefit. Queries are not tied to a specific set of partition columns. As a result, table partitioning can change and a table can manage data in multiple partition layouts.
To update a table’s partition structure, run an
ALTER TABLE command with
DROP PARTITION FIELD. For example, these two commands remove partitioning by
level and increase the granularity of the time partition to hourly:
-- Remove level from table partitioning
ALTER TABLE logs DROP PARTITION FIELD level
-- Add hourly partitioning on event time
ALTER TABLE logs ADD PARTITION FIELD hours(event_ts)
Updating the partition configuration for a table is solely a metadata operation; it doesn’t change existing data. Historical data files stay in the old partition structure unless you choose to rewrite them. New data is written into the new partition structure automatically.
Also, note that the change above added a partition field,
hours(event_ts), without removing
days(event_ts). This is recommended when moving to finer-granularity time partitions because it preserves the older partition data in metadata tables. Because time-based partition transforms align at partition boundaries, there is very little added overhead to keeping daily partition values.