How To Install


Apache Iceberg is language- and engine-agnostic, meaning it was designed to be portable so that any language or engine can interact with Iceberg tables. PyIceberg is the official Python client of the Iceberg project and an easy way to get started with Iceberg. It provides a lightweight option to query data from an Iceberg table for further analysis using your favorite Python tools for data science and AI.

PyIceberg has full read support and integrates with other projects, including Polars, Pandas, and DuckDB. Write support is underway, and the best way to track the progress is by following the repository or the documentation website.

PyIceberg is available through pip and can be installed using pip install. Support for the REST catalog comes out of the box, and the PyArrow extra is the easiest way to get started.

pip3 install -U "pyiceberg[pyarrow]"

Installing optional extensions (extras)

Optional packages can be installed depending on your needs to keep the installation lightweight.

Extras for FileIO (to fetch the data):

pyarrowPyArrow filesystem backend (supports S3, HDFS, and others)
s3fsfsspec implementation for AWS S3
adlfsfsspec implementation for Azure ADLS
gcsfsfsspec implementation for Google Cloud Storage

Extras to add catalog implementations (REST catalog support is built in):

sql-postgresSupport for a Postgres-backed metastore
hiveSupport for Apache Hive Metastore
glueSupport for AWS Glue
dynamodbSupport for AWS DynamoDB

Note that the REST catalog is not listed since it is supported out of the box.

Extras to add integration with your favorite data analysis toolkit:

pandasSupport to read directly into a pandas dataframe
arrowSupport to read directly into an Arrow dataframe
duckdbSupport to query the data using duckdb
raySupport to convert the data into a Ray dataset

You can mix and match the options according to your needs. For example, if you want to add support for ADLS and DuckDB, you’d install both the duckdb and adlfs extras.

pip3 install -U "pyiceberg[duckdb,adlfs]"