The concept of repeatability for IaC is coming back to an expected behavior of the system repeatedly. In terms of automation, the repeatability for IaC is that the automation achieves the same result even if the underlying systems, infrastructure, and components are different. This can be complex to navigate and since automation is not always repeatable customers can have a high failure rate on some processes.
In the fourth episode of our six-episode series on scaling Infrastructure as Code (IaC), Swapnil Bhartiya sits down with Rob Hirschfeld, CEO and Co-Founder of RackN, to discuss what repeatability means for infrastructure as code. He goes into some of the key challenges of repeatability and the role automation plays. He reinforces the idea of what it is to have consistency across the system and some of the common misconceptions around it.
Key highlights from this video interview are:
- Repeatability for infrastructure can be complex since it is not running the same path repeatability as building infrastructure involves bringing in new components, recycling them, or using different pieces. Hirschfeld discusses how repeatability fits into the context of infrastructure and what that means in terms of infrastructure as code.
- Hirschfeld feels that writing IaC is primarily a statement about the developer process and how you can decompose writing automation into reusable blocks. He believes people underestimate the components of a developer-like experience with IaC. He explains the importance of being able to test and verify many different scenarios on different environments when writing IaC.
- Automation is not always repeatable, and some of RackN’s customers have infrastructure with 20% failure rate on some processes they go through. This is because automation is not always rigorously tested. Hirschfeld discusses why retries can be problematic and why omitting them is important to fix the root cause of a problem.
- Hirschfeld goes into the key factors that make it difficult for automation to be repeatable, such as, changing operating systems and the infrastructure having variation in it. He explains that often it is due to things outside of your control, like when CentOS switched over to CentOS Stream and suddenly you were not able to patch anymore.
- Achieving consistency across the system is not necessarily about making everything the same but rather collecting data across the board and normalizing information that is collected, providing a path through. Hirschfeld discusses some of the misconceptions with consistency and how their approach is different.
- APIs are an abstraction point but what can happen is that when using an API consistently, the way it gets delivered to you or the way it responds to your requests changes over time. Hirschfeld explains why it is important to understand that it is the behavior you are getting, not just the data that is returned. He discusses how the RackN product uses the feature flag to help by defining the behaviors of the system.
- RackN’s core mission is to provide their customers with usable and reusable automation blocks so they can have consistent experiences. While their customers all have different needs they want to use the same automation. Hirschfeld describes why reuse is so important for them and how it led to a pipeline concept with known injection points.
- According to Hirschfeld, Day 2 is the most critical piece about repeatability. He explains why it is critical to be able to build your systems so that when new requests come in they are addressed reliably but also with changes in the behaviors of the systems that you still have repeatability.
The summary of the show is written by Emily Nicholls.