How Tabular Completes Apache Iceberg

Apache Iceberg is a modern table format that provides for ACID transactions and SQL behavior on a data lake. Tabular was built by the original creators of Iceberg to provide the fastest path to an enterprise-grade storage platform built on Iceberg tables. It offers the following enhancements to open source Iceberg. 

Performance optimization

Get substantial performance improvements. While Iceberg provides solid performance through its default configuration, the Tabular Optimizer speeds your queries and lowers your costs by dynamically tuning each table. The Optimizer codifies a decade of Apache Iceberg know-how into a rule set that continually optimizes compaction and compression settings based on each table’s data profile and query patterns. 

Centralized RBAC

Harmonize data security across all compute engines and frameworks. Tabular includes centralized role-based access control (RBAC) enforced at the database, table, or column level. It enforces access control for any query method, from a custom Python script to Spark jobs to query engines such as Trino / Amazon Athena.

Integrated data pipelines

Eliminate the tedious engineering of create pipelines to load data. Tabular provides UI or API-configurable ingestion from files via Tabular File Loader or change data capture (CDC) for mirroring relational databases. Uniquely, Tabular optimizes your data as it is ingested. 

SaaS or managed private cloud deployment

The Tabular catalog and data, ingestion and RBAC services run as a dedicated instance in our cloud, connected via private peering to data in your cloud account. Attach query engines in minutes using configurable connectors for Amazon Athena, Google BigQuery, Spark / EMR, Trino and more. You can be connected and running in minutes.

Serverless operation

Make day-to-day management painless. Tabular provides self-service deployment, auto-clustering and auto-scaling. It maintains your data through automated garbage collection. It provides monitoring data compliant with the OpenMetrics API for use with 3rd-party systems such as AWS CloudWatch and DataDog.

What Tabular Adds to your Iceberg Tables

CategoryFunctionWhat Tabular provides
Optimization
CompactionDynamic, table-specific optimization of file compaction jobs for substantial performance gains.
Metadata managementAutomatic, dynamic compaction of metadata files for best performance. 
Automatic deletion of metadata files that are no longer required. 
Table maintenance
Snapshot expirationAutomatic deletion of expired snapshots.
Orphan file cleanupAutomatic, safe deletion of orphan files.
Ingestion
From storage buckets (files)File Loader service automatically detects and ingests new files in a source storage bucket. The table is optimized as new data is ingested.
From databases (CDC)CDC service mirrors database tables in Iceberg by ingesting change events to a log table and merging them to a target table. The table is optimized as new data is ingested.
Security
RBACRBAC service provides central control over permissions per database, table or column. Permissions are applied across all writers and readers.
AWS IAMTabular accepts AWS IAM identities for authentication. 
SSOOkta, Google OIDC, Custom OIDC
Cross-domain identity managementOkta SCIM2, Custom SCIM2
Fine-grained access controlColumn-level labels are used to restrict access or mask columns.
Operations
SaaSTabular runs as a cloud-hosted managed service. Data is read and processed in Tabular’s account but only written back to the customer’s account.
Managed private cloud on AWSTabular deploys to an AWS account controlled by the customer, that interacts with the customer’s data without leaving the customer’s control.
ScalingAutomatic (serverless)
Iceberg UpdatesAutomatic
MonitoringExposure of Iceberg metrics via OpenMetrics standard for use in 3rd-party tools.
APIs
ConfigurationProgrammatically manage Tabular resources and configuration. OpenAPI compliant.
REST CatalogTabular provide a catalog that follows the Iceberg REST Open API specification.
TerraformAPI calls and scripts available for provisioning via Terraform.