Implementing RBAC best practices with Tabular

Tags: access control, best practices, RBAC, security

November 1, 2023

Your cloud data lake storage can be difficult to secure as we’ve previously discussed in our Securing the data lake blog series. Ideally, you want to tie permissions for specific tables to particular user roles (RBAC) at the storage layer, independent of the access method (query engine, Spark, Python).

However, while access privileges are all about databases and tables, in a data lake the primitives we have to work with are files and file systems. Because of this mismatch, the most common approach today is to set table-specific access controls through the compute layer.

But if you use multiple compute technologies, how do you keep their access control policies in sync? Also, different compute engines have different access control models that may not be reconcilable. Lastly, Spark or Python access have no native access control mechanism.

The result is an access policy sprawl that becomes difficult or impossible to manage. It becomes increasingly difficult to say that your data ecosystem is in compliance across all data assets and all users.

Starting to feel nervous? Don’t. You’re not alone and you’re in the right place. 🌞 Tabular solves the problem of implementing cross-engine universal RBAC. In this post we’ll walk you through some best RBAC practices from Snowflake and configure them in Tabular, to create a simple and scalable approach for securing access to Iceberg tables.

Securing your data at the storage layer

Tabular is a cloud-native managed storage engine that provides a range of services on top of Apache Iceberg tables. One of its unique features is a centralized role-based access control (RBAC) security layer, that ensures that your data is secure and accessible only to authorized users, down to the table layer (and soon the column) across all compute engines and frameworks.

In short, with Tabular, you get the ease and security of a managed data warehouse system but the storage and compute flexibility of a data lake.

Tabular’s universal storage layer offers several advantages:

Centralized access control: Manage and govern all data access from one place, simplifying administration and reducing the risk of unauthorized access.
Consistent security policies: Avoid one-off access patterns for each way you store or access data. Each new query engine shouldn’t require you to rebuild your access controls from scratch.
Streamlined data management: Easily manage and monitor data access, reducing the complexity of managing multiple storage systems – especially as your data and access mechanisms change, which they will constantly from now until the end of time.

Let’s break down how we build scalable and secure RBAC hierarchies within Tabular.

Separation of concerns: object access roles vs. business functions roles

One of my favorite parts of the Snowflake access control documentation is its humble suggestion to separate your object access roles (roles that define specific levels of access, like ownership on a database) from business function roles (marketing_data_engineer). Business function roles inherit permissions from object access roles.

This approach adds value by separating the concerns of designing appropriate roles and makes it easier to update role definitions over time.

Object access roles

Object access roles in Tabular are used to manage access to securable objects such as warehouses, databases, and tables. These roles provide granular control over data access, ensuring that users can only perform actions on the objects they have permission to access. The area of concern for these roles is to decide which data assets naturally “go together” in terms of permissions.

These roles do NOT get granted directly to users; they are only assigned to Business Function roles.

Business function roles

Business function roles in Tabular are used to manage data access for specific business functions or tasks. These are the roles that you assign to your user or application and should reflect your organizational structure and job responsibilities. The area of concern for these roles is how business functions group together when it comes to permissions.

These roles have NO direct object access. They inherit their object access from object access roles.

Getting hands on

Ok, that’s enough theory – let’s actually pull this all together 💪 using the roles needed in a marketing department as a generalizable example.

Imagine accessing a warehouse called ENTERPRISE_DATA_WAREHOUSE. This warehouse may house databases for:

marketing_raw (bronze)
marketing_modeled (silver)
marketing_mart (gold)

This Tabular screen shows your available databases. The Access Controls tab allows you to create roles and assign access rights to these.

Let’s first reason about what kinds of object access roles will be required:

Write access for ingesting raw data such as ad clicks, website behavior, and online transactions: marketing__write_raw
Write access for modeling data and populating the data mart: marketing__write_except_raw
Broad read-only access to all marketing databases: marketing__read_all
Read-only access for consumers of data mart tables: marketing__mart_read

Next, let’s think through what kinds of Business Function roles might need access based on what they’ll be doing:

💽 producing raw data: marketing__raw_data_producer
🛠️ transforming raw data: marketing__data_transformer
👩‍🔬 data science: marketing__data_scientist
📈 bi consumption: marketing__bi_consumer

Granting Role Access

Let’s take a look at the access described in one of our object access roles:

A few things to note:

The SELECT level of access grants broad read access to everything in this specific marketing database.
Any future table that is created in this marketing database will automatically be readable by this role.
If you’re doing role maintenance and going back to address access issues, it’s simple to update access in bulk as needed.

Connecting Object Access roles to Business Function roles

Now, let’s open the marketing__read_all role and grant it to the data scientist role:

Well that was easy!

I’d like to call out one of the biggest advantages of use object access and business function roles – errors are easier to spot!

If you see that the marketing__read_all role is granting some level of edit access, that’s clearly a mistake that one can easily reason about.

Contrast that with a single-tiered role system, where you might see a BI role in this location 🤔. You can’t easily tell if that is correct. Heck, maybe the BI team has a one-off exception? Maybe it’s from some old stuff you used to do and never updated? Maybe it’s wrong, but you’re not in the BI team, so 🤷.

Do yourself, your IT Admins, your auditors, your engineers, your analysts, and your C-suite a favor – use a central storage layer with a universal RBAC model that meets your needs today and has the flexibility to adapt to constant change 🫠.

Why choose Tabular as your modern storage platform

Tabular’s centralized RBAC model offers a powerful and flexible access control system that helps enterprises maintain a strong security posture.

By implementing object access roles and business function roles in Tabular, you can:

Protect sensitive data and ensure compliance with regulatory requirements.
Simplify the administration of data access through a single point of policy enforcement at the storage layer.
Improve productivity and collaboration by providing users with the appropriate level of access to resources while minimizing risk.

If you’d like to explore RBAC within Tabular as well as build out your own databases and tables, sign up for our free tier.

Implementing RBAC best practices with Tabular

Securing your data at the storage layer

Separation of concerns: object access roles vs. business functions roles

Object access roles

Business function roles

Getting hands on

Granting Role Access

Connecting Object Access roles to Business Function roles

Why choose Tabular as your modern storage platform

Related Posts

Securing the Data Lake – Part III

What are Tabular credentials?

Securing the Data Lake – Part II