Tabular and the Iceberg Community


Tabular and the Iceberg community

In our announcement blog post, I briefly mentioned Tabular’s relationship with the ASF Iceberg community:

We pledge to support and contribute to the independent community—never to control it or do harm

That sounds pretty great! The problem is it’s too vague. In this post, I want to get more specific about the open source culture we’re building at Tabular that will help us keep the Iceberg community a great place to contribute for years. From the project’s formation, we’ve worked hard to make sure that the Iceberg community is inclusive and healthy, so now we’re starting early to ensure Tabular contributes to community health.

What Iceberg needs

To start, it’s worth identifying what the Iceberg community needs. As a storage standard, wide-spread support is critical to Iceberg’s success. In addition to the network effect of open source contributors, there’s a network effect from the projects that work with Iceberg tables. Each project or framework that supports Iceberg becomes interconnected with Spark, Trino, Presto, Flink, Hive, and more; the overall data platform becomes more flexible and capable. This is why Netflix released Iceberg in the first place: to create a ubiquitous standard that could unify a fragmented ecosystem.

To be that ubiquitous open standard, people needed to know that they can rely on Iceberg. That comes down to 3 traits that I think are critical for the community:

  • Iceberg needs to be collaborative – this new architecture brings together very different processing patterns. Listening and close collaboration are essential for Iceberg to continue innovating on features that work across engines and environments.
  • Iceberg needs to be neutral – no external project or company should be preferred when making choices. Without neutrality, companies or other projects can’t confidently build on top of Iceberg.
  • Iceberg needs to be independent – people need to know that the project’s governance and license won’t change at a whim. Otherwise, neutrality and open collaboration could be compromised.

While at Netflix, we helped build the Iceberg community with these traits. The community chose to join the ASF to ensure that it remains independent—as an ASF project, the license is fixed and decision making is inclusive and predictable. This is a backstop to keep the community healthy.

Now that I’m at Tabular, I’ve been asked what we’ll do. And for good reason! There are too many bad examples in other communities. Tabular won’t be one of them.

Building an open source culture

My aim for Tabular is to set up an open source culture that reinforces healthy interactions and prevents slipping into short-sighted habits that could harm the Iceberg community. That culture starts with setting guidelines now, publicly, before there are tough choices or any tension.

Everyone that works at Tabular will follow these 3 guiding principles:

First, always be honest and open with the Iceberg community. We will have discussions in good faith by openly discussing our needs and where we have conflicts. Open source works on social capital and nothing erodes trust as quickly as trying to hide motivations. People see through duplicity quickly, and open discussions with our partners in the community are more important than individual decisions.

Second, remember we are participants and contributors, there are no owners. Recognizing that we are a part of the community and not the center of it is an important mindset. Misguided ownership is often the basis to rationalize accumulating or exercising control. I’ve heard people say things like Why are they pushing back after everything we’ve contributed? or We’re the experts! The community doesn’t owe anyone for past contributions, and experts don’t always know better. As with being honest, it is better to maintain trust than to exercise control.

Third, make decisions about products and services that minimize conflicts of interest. More concretely, improvements to the Iceberg format and library will always be done in open source. Tabular isn’t building a proprietary version of the format with private features—it would be self-defeating because it fractures the standard.

When I’ve seen problems in other communities, they are almost always rooted in a conflict of interest that grows to be intractable because trust is gone or some side exercises control and overrules the other. But these problems aren’t the result of malice. Much more often, bad habits or careless behavior have eroded trust and created excuses not to listen.

Our guidelines are crafted to avoid those mistakes. And Tabular is committed to prioritizing collaboration with the community: the Iceberg format and components that are built into processing engines should be open and we will work with the open source community to improve them.

Tabular will build products that complement Iceberg by providing painless infrastructure and automatic table management and optimization. This strategy keeps our interests aligned with the interests of the community, and we look forward to working in the community as it continues to transform how we work with data.