Apache Iceberg is a modern table format that provides for ACID transactions and SQL behavior on a data lake. Tabular was built by the original creators of Iceberg to provide the fastest path to an enterprise-grade storage platform built on Iceberg tables. It offers the following enhancements to open source Iceberg.
Get substantial performance improvements. While Iceberg provides solid performance through its default configuration, the Tabular Optimizer speeds your queries and lowers your costs by dynamically tuning each table. The Optimizer codifies a decade of Apache Iceberg know-how into a rule set that continually optimizes compaction and compression settings based on each table’s data profile and query patterns.
Harmonize data security across all compute engines and frameworks. Tabular includes centralized role-based access control (RBAC) enforced at the database, table, or column level. It enforces access control for any query method, from a custom Python script to Spark jobs to query engines such as Trino / Amazon Athena.
Integrated data pipelines
Eliminate the tedious engineering of create pipelines to load data. Tabular provides UI or API-configurable ingestion from files via Tabular File Loader or change data capture (CDC) for mirroring relational databases. Uniquely, Tabular optimizes your data as it is ingested.
The Tabular catalog and data, ingestion and RBAC services run as a dedicated instance in our cloud, connected via private peering to data in your cloud account. Attach query engines in minutes using configurable connectors for Amazon Athena, Google BigQuery, Spark / EMR, Trino and more. You can be connected and running in minutes.
Make day-to-day management painless. Tabular provides self-service deployment, auto-clustering and auto-scaling. It maintains your data through automated garbage collection. It provides monitoring data compliant with the OpenMetrics API for use with 3rd-party systems such as AWS CloudWatch and DataDog.
What Tabular Adds to your Iceberg Tables
|What Tabular provides
|Dynamic, table-specific optimization of file compaction jobs for substantial performance gains.
|Automatic, dynamic compaction of metadata files for best performance.
Automatic deletion of metadata files that are no longer required.
|Automatic deletion of expired snapshots.
|Orphan file cleanup
|Automatic, safe deletion of orphan files.
|From storage buckets (files)
|File Loader service automatically detects and ingests new files in a source storage bucket. The table is optimized as new data is ingested.
|From databases (CDC)
|CDC service mirrors database tables in Iceberg by ingesting change events to a log table and merging them to a target table. The table is optimized as new data is ingested.
|RBAC service provides central control over permissions per database, table or column. Permissions are applied across all writers and readers.
|Tabular accepts AWS IAM identities for authentication.
|Okta, Google OIDC, Custom OIDC
|Cross-domain identity management
|Okta SCIM2, Custom SCIM2
|Fine-grained access control
|Column-level labels are used to restrict access or mask columns.
|Tabular runs as a cloud-hosted managed service. Data is read and processed in Tabular’s account but only written back to the customer’s account.
|Managed private cloud on AWS
|Tabular deploys to an AWS account controlled by the customer, that interacts with the customer’s data without leaving the customer’s control.
|Exposure of Iceberg metrics via OpenMetrics standard for use in 3rd-party tools.
|Programmatically manage Tabular resources and configuration. OpenAPI compliant.
|Tabular provide a catalog that follows the Iceberg REST Open API specification.
|API calls and scripts available for provisioning via Terraform.