File Loader ingests data files in near-real-time into Apache Iceberg tables. Tabular monitors a location in S3 for new files and manages writing new data into tables while optimizing the table and handling schema evolution.
Features include:
- Schema inference and evolution. New columns are automatically added to the target table, field types are automatically inferred, and dropped columns are retained.
- Table-specific optimizations are applied during ingestion, based on analysis (look at other page). These improvements can reduce the size of the data — and query speed and cost – by up to 80%
- Serverless operation – pipelines are based on simple, declarative configuration. They require zero infrastructure, orchestration, engineering, or maintenance.
- Parquet, CSV, TSV, JSON, and XML file format support, including complex data structures such as nested fields and arrays
- Exactly-once semantics eliminates hand-crafting checkpoints and dedupe jobs.
Other things you should know about File Loader:
- UI or API based configuration (e.g. using Terraform)
- Tabular RBAC restricts ingestion to users with proper permissions.
- Observable pipelines – Tabular supports the Open Metrics API to expose ingestion activity to 3rd-party observability tools. You can monitor and alert on pipeline latency and errors,
- Transparent, predictable pricing. Pricing is pay-as-you-go and based solely on the volume of source data being ingested. Usage can be monitored within Tabular.