Using OPA for complex CRD validation and defaulting

About 18 months ago, Tim Hockin was on-stage at Kubecon Seattle (2018) providing a crystal ball look into the future of CRDs. His bottom line was that all or at least most of the new (and even some of the existing) APIs should be CRDs.

I specifically latched onto the fact that if CRDs are becoming the API, then better user experience is needed. These days we are ramping up our knowledge on Gatekeeper & OPA in conjunction with developing validation and defaulting use cases for our CRDs. I gave a lightning talk on this at Kubecon San Diego (2019) about this. This post will provide more details than I could provide in a ~5-minute talk.

On validation and defaulting

The validation use-case provides three questions to solve. The idea is to make developers’ life easier and to avoid human error introducing bugs. These questions are:

Are all the required fields populated?
Are they populated correctly?
Are there additional fields that need to be populated?

You may be asking yourself if these questions are relevant. I mean, OpenAPI v3 schemas solved this, didn’t it? Well, the answer is yes, but not completely.

Some use cases that go beyond that, especially if you are working with many CRDs.

With OpenAPI v3 schemas you can validate for the population of a field (i.e. ‘not empty’). You can validate for type and for correct content (e.g. min, max, enum, regex pattern). These schemas can also provide default values, but only a for spec fields and only a single value per field.

In short, schemas are necessary for you to use apiextensions.k8s.io/v1 and the extended CRD functionalities of the latest and future versions of Kubernetes. And they are great for basic validation and defaulting. Granted, they are declarative at CRD creation time. The drawback is that they are static.

Advanced validation and defaulting

So, if schemas are basic, then what are those advanced use cases that we’re looking at?

In terms of content validation, we have found that extending it to allow validation against a dynamic list of values is necessary. This allows for the values to change over time, or take into account a changing context. These values can either be kept outside of Kubernetes or inside as Kubernetes resources, for example, namespaces or CRDs.

These concepts hold true for defaulting as well. Our approach is to default beyond static, single values. So we have use cases where this is based on dynamic data and be able to tie it into changing contexts. But, this requires an admission controller.

There’s one more use case for extending defaulting, which is around metadata. If you would like to default labels, e.g. because you have label selectors in your controller, you will need an admission controller too.

Real-life example of a CRD use case

Now that you are inspired, let’s have a look at how this works.

Imagine a CRD that is used to install apps. The way things are today, users would fill out the whole YAML themselves. But since we are seeking to create a better user experience than that, we can take it to a different level. All the users have to do, is to define that they want an app, from a certain catalog. In this example, it would be Prometheus from the Giant Swarm catalog.

Then, based on the context and the specific business logic in the specific company, the rest of the information gets populated by default.

In this example, if we know the team is using a certain namespace in a certain cluster, from which we might be able to infer the environment based on business logic and organizational processes. Then we can default the configuration for that team. The specific development environment can drive certain configuration parameters too.

On the validation side, we can check if the specified catalog even exists. If it doesn’t, we can provide information on existing catalogs.

Then we can check if the requested app is in the specified catalog. Checks can also be made regarding the specified version. In case none was specified, we could default to the latest stable version.

Why Open Policy Agent

To this point, we were talking about admission controllers and I am well aware that if you are reading this, you are most probably capable of building your own. So, what does all of this have to do with OPA? I mean, it was in the title of the post.

Well, OPA can validate, mutate, and authorize. Which is why it makes sense to use it instead of writing your own admission controller. In addition, instead of writing custom code, it allows you to write your rules as reusable Rego files. In the case of Gatekeeper, you can even create templates for rules and use them flexibly with different use cases.

With Rego, you can define rules that are easy to read and write. Rego testing ensures that queries are correct and unambiguous. Rego is declarative so authors can focus on what queries should return rather than how queries should be executed. Ultimately, it provides a single agent for implementing rules.

Conclusion

To keep this post short, the vision is that OPA/Gatekeeper becomes a default Kubernetes add-on. This is already currently in discussions as Gatekeeper might be replacing the PodSecurityPolicies implementation, which would make it a defacto default at least for secure clusters. Based on that, extenders of Kubernetes and builders of tools could package their software with all necessary validation and defaulting rules as OPA rego rules and would not need to write and maintain additional admission control code in their repositories.

BTW, I’ll be discussing this further at Rejekts Amsterdam, let me know if you have any questions you’d like to discuss.

Authored by: Puja Abbassi, Developer Advocate at Giant Swarm. As a CNCF ambassador, he’s passionate about bringing cloud-native technologies to more developers and their companies around the globe. In Kubernetes, he focuses on security and authentication as well as extending Kubernetes with custom resources and controllers (aka operators). With a few years of Kubernetes experience and having been in the beta batch of CKAs, he enjoys solving problems and helping people in Kubernetes Office Hours, Slack, and Discuss. He is also a contributor to the CIS Kubernetes Benchmarks.