Centralized Role-Based Access Control  (RBAC)

Fragmented policies create data security complexity and risk

Many companies use a complex patchwork of compute platforms and cloud native access controls to enforce security policies. For instance, they manage policy for Trino and other data lake tools in Ranger, but use separate access controls in their data warehouse (e.e. Snowflake, Amazon Redshift, or Google BigQuery), and must use Amazon S3 ACLs to secure Spark and Python access to data. 

This fragmented access control environment creates a security challenge where multiple compute environments have broad access to our data and you have to keep access policies in sync across multiple systems. Security holes are virtually guaranteed because it’s labor intensive to keep policies up to date and difficult to prove that multiple security schemes are identical. 

S3 file level permissions also require additional logic to translate object operations to SQL — from GET, PUT, and DELETE to INSERT, UPDATE, and DELETE, and SELECT. Translation between these two regimes requires a complex web of Allow and Deny statements for every possible file prefix and role combination, which spirals into thousands of file access policy changes. Understanding, auditing and even storing that much policy information is unsustainable at scale.

Our zero-trust RBAC vision 

Tabular’s vision for security is based on traditional database access control methods. The service enforces role-based access control (RBAC) based on a universal, shared storage layer that can be accessed by numerous methods. 

Instead of initially granting broad access to compute platforms and then locking down each one separately, it grants no access by default (zero-trust) and then per-request allows users to access data based on a single, central copy of policy. This design works equally well for query engines like Trino, open compute environments like Spark, or even direct access from Python on a user’s local machine. Under the hood, Tabular applies these role-based authorization decisions at the file level. 

In short, Tabular applies policies centrally, per-request, and on a zero-trust basis, so that security administrators don’t need to worry about the configuration of various compute tools or control file-level access.

Principle of Least Privilege (PoLP)

Along with zero-trust, we believe that people should only have access to the data that they need to perform their job, known as the Principle of Least Privilege (PoLP).

Least-privileged access depends on knowing who or what is requesting access. The most straightforward way to prove identity for data access decisions is for the user to directly provide a user credential, similar to an API key. 

Tabular enables trusted infrastructure to pass an authenticated user identity along with a verification mechanism, in exchange for a secure token used for data authorization decisions. Tabular maps AWS IAM identities to Tabular roles. This makes it easy to manage data access policy for scheduled jobs or processes running in AWS.

Key capabilities provided by Tabular RBAC

  • Fine-grained access control (column-level)
  • Table-level access controls
  • “Future permissions” to pre-configure policies for new databases and tables
  • A role can be given privileges to any future tables in a database ahead of time 
  • Cascading privileges for automatic inheritance of permissions
  • Integration with identity management systems such as AWS IAM, SSO via Okta and SCIM
  • Audit logs to track policy changes

Tabular RBAC at work

To provide both control and flexibility, Tabular uses an RBAC model – access privileges are assigned to roles, which are in turn assigned to individuals. This model differs from a user-based access control model, in which rights and privileges are assigned directly to each user or group of users.

For instance, if a user issues a SELECT request via Trino or Spark, Tabular follows these steps:

  1. Authenticates the user vs. AWS IAM
  2. Confirms that the user belongs to a role that allows SELECT access to the table in question
  3. Confirms whether visibility to any columns are blocked for that role
  4. Pre-signs a request to S3 for the required files