The Illusion of Control: How Policy Driven Networks Failed to Keep Up with the Cloud

Author: Joel Mulkey

The IT team at New Seasons Market here in Portland, Oregon, is one of my favorite customers to work with. At two different companies now I’ve had the opportunity to serve them as a vendor. Most recently, I got involved when their new “cutting-edge” network technology wasn’t keeping up with the demands of their business.

They deployed an SD-WAN system from another company who, like pretty much every other networking vendor, built their product around a set of policies that dictate behavior. New Seasons had configured those policies properly, but the network wasn’t behaving like they intended. Specifically, they had implemented this SD-WAN system to ensure seamless failover for all their applications over multiple internet connections, but when one connection would fail, only certain applications would keep working while others would break.

The team at New Seasons thought they had control of their network behavior because they had defined the right policies, but it was clear they really didn’t have control. There was a gap between their intended behavior and what could be accomplished through static network policies. As companies like New Seasons migrate to the cloud that gap becomes increasingly apparent.

What we saw with New Seasons is something that everyone seems to be dealing with to some extent. Almost every company is on a cloud journey. Depending on their size, they may have started cloud-first or they might just be dipping their toes into cloud usage. Across our customers, we see a recurring pattern of cloud adoption. It’s not an on/off choice; it’s a progressive journey. It usually starts with personal use, driven by the consumerization of IT. Every company is seeing this – people taking their own initiative to use social media, web conferencing, and other tools. Then, department leaders decide to leverage specific technologies for their teams, such as a project management or CRM system.

At this point, the cloud has become an important part of the business’ operations. The next step for most companies is when an executive decision is made to move a company-wide technology to the cloud. We see this most often in the form of a move from an on-prem PBX to voice-over-IP, or from on-prem servers to Office 365. At this point, the entire company’s operations are tied to cloud performance. Finally, companies eventually adopt a cloud-first mandate. They see all the benefits of the cloud and mandate that all new technology the company buys be cloud-first (and often SaaS-first) where possible.

There’s an important distinction between these first two and second two stages on the cloud journey. In the last two stages, IT or some central department is provisioning the software like they always have. But in the first two, IT may have no idea that these technologies are being used, even though they’ve become critical to day-to-day business operations. Even after a business has moved into the last two stages, the first two don’t stop. That consumerization of IT has been powerful for business leaders, and supporting those personal and departmental applications is a key need for businesses today.

When companies have this big messy mix of cloud application usage, the network behavior can be very complex. Keeping ideal network performance at each slice in time, as well as all along as the journey progresses, is a big challenge. Network policies, like “prioritize Salesforce,” don’t even scratch the surface when you consider the tens or hundreds of varying applications that many businesses make use of each day. What are the user expectations for all those applications? My guess is their expectations differ from the expectations that the IT team might want to set.

User Expectations in the Cloud Era

Meeting user expectations in the cloud era can be an overwhelming challenge. With the advent of cloud adoption, users expect a different procurement and usage experience. They want choice in the applications they use and the environments they use them in, often with the same frictionless app-store installation they’ve come to expect on their phones. They also expect automated behavior and instant access, so they can get to work immediately with no configuration hassles, change management requests, or other bottlenecks.

While this situation presents a lot of challenges, it’s a good change. Let’s not think of it as evil “shadow IT” meant to steal and destroy. This new technology environment allows businesses to grow, innovate and expand better, faster and more efficiently. We just have to somehow meet the expectations it creates.

IT leaders at these cloud-enabled businesses are expected to provide the predictable performance that they did with on-prem systems. Downtime is, of course, unacceptable, and beyond that, these cloud applications need to work the way they were meant to. They need to be responsive, fast, and not frustrate users. And they have to do all of this in a way that scales – working consistently as business needs change, revenue grows, and locations expand. While meeting those performance requirements with on-prem applications might have been relatively simple, the cloud is creating new challenges for the network, challenges not readily solved through static policies.

Network Challenges

The first challenge is that, by nature, cloud technologies rely on the Internet, and no matter what kind of internet circuit you look to use, performance is unpredictable. We monitor thousands of internet connections ten times per second, and can tell you very simply that it doesn’t matter what kind of connection you have; fiber, cable, wireless, etc., they all can experience business-affecting degradation. This can mean downtime or brownouts, and both can be expensive, frustrating to users, and potentially job-threatening to the people responsible for networking. Our data shows that on average, an ISP connection will experience 3.5 hours of downtime per month and 23 hours of major performance-affecting brownouts per month. This can be at 2 AM, so if you’re not a 24/7 business maybe that hasn’t hit you hard yet, but it can also be during the middle of the day.

The second challenge comes from the ever-shifting application landscape. If your users are leveraging the cloud well, procuring and consuming new applications all the time and shifting their use of current applications, how does the network keep up? There’s no way for network administrators to know exactly the makeup of their traffic 6 months from now (if they even know what it looks like today). This means that it’s impossible to create static network policies that can provide a good user experience over time.

Finally, the third challenge is that IT is no longer the sole gatekeeper of software provisioning. This means that, even with plentiful IT resources, there are critical technologies being used in businesses that IT doesn’t know about and has no policies to manage.

These challenges lead to an all-too-familiar game of policy whack-a-mole. Users get frustrated by application outages or poor performance. They contact IT in a huff, and now IT has to decide if they’re going to ignore the problem, say it’s not a supported application, try to fix it with some new network policy, or maybe contact a vendor so that vendor can implement a new policy. At the end of the day, no one is happy, and the cloud is a point of frustration rather than empowerment.

Here’s what I think is at the root of the situation: It used to be that IT controlled every application. IT was the bottleneck and was able to control the experience. Sometimes that worked out pretty well, but that’s not the way the world is moving.

In this new cloud era, IT is now a facilitator, end-users are interacting directly with their applications. They’re procuring, configuring, using, and troubleshooting them. Yes, they’ll still come to blame IT when they don’t work right, but the interaction model has fundamentally changed. IT leaders need network technologies that support this new model, that enable their users to be nimble and efficient with applications.

If their network is built upon a set of static, human-controlled policies, how do companies ensure performance and reliability when new applications can be added into the environment at any time? They can probably configure policies for 80 or 90 percent of their business needs. That’s pretty good, and they’ll feel like they’ve done their job, the box is checked, they can say they have failover or application prioritization. But what about when one of those new applications uses a ton of bandwidth – how does that impact their existing business-critical applications? Or when their CEO gets sent an invite to use some random videoconferencing tool they’ve never heard of for a key meeting, are their network policies going to ensure he or she is able to communicate effectively, or will they get a frustrated email after the call?

Beyond simple frustrations and inconvenience, if a business relies on static network policies, it’s exposed to the risk of falling behind competitors who are using the intelligent software in their network. The solution to this lack of control isn’t more policies, it’s a smarter network.

Intelligent Network Software

Intelligent software is being built into technologies in many areas right now. These include security, WiFi, data center networking, and internet/WAN (SD-WAN). I’ve spent almost 6 years building a platform to use intelligent software to improve internet and WAN connectivity. When I started Bigleaf, I wondered – what if we built something explicitly for cloud apps running over the internet, and didn’t try to compromise by including legacy needs like private networking? What if we built for the cloud first? I realized if we were 100% dedicated to that, we could build software to control the experience much better than people can.

To support the new environment that the cloud has brought about, businesses need a network that possesses a few key qualities. It should be impactful and fast to deploy. It has to be reliable, even when user demands and network conditions vary. It should be flexible, working with any cloud application with ease. And it should be future-proofed by being as autonomous as possible.

Let’s drill down into each of these areas to see how intelligent software provides more effective control than manual policies.

Impact

Focusing the scope of intelligent software is key to a good outcome. What I’ve seen in the software and networking worlds is that intelligent software is difficult to build and implement successfully. The result of that is, if companies try to deploy it broadly to solve many problems at once, it fails. The impact is watered down by too many variables and implementation issues, or the software just isn’t capable enough to solve the problems.

A good example of this problem for anyone into Audio-Video (AV) or home theater, is the “home theater in a box”. If you try to pack 7 speakers, a subwoofer, a receiver, and a blu-ray player all into a little box that someone can fit into their back seat, you’re going to have a sound system that doesn’t wow anyone. The reason the sound quality at the cinema is amazing is because the speakers behind the screen are each the size of a refrigerator, and they don’t play any part in the visuals you see. They’re dedicated technology built for a specific purpose. If you want a network that runs on intelligent software, you should narrow the problem scope that you try to address with any specific piece of technology.

For Bigleaf what this looks like is that we don’t touch several portions of the network today. Our solution sits entirely outside of the customer’s network and firewall. We sit in-between them and the internet, connecting them to the cloud through our core network. We don’t touch the LAN, we don’t touch security, and we don’t touch their core data center infrastructure. This allows us to create very specialized software that solves a huge need of reliability and performance to and from the cloud, with full autonomy. Now, I say “today”, because we are working on ways to extend our technology, but stay tuned for more details on that in the future.

Reliability

Our world today is real-time. If a key phone call drops, or even glitches for a few seconds, that really bothers people. If a video pixelates, everyone wonders what’s wrong with the network. There’s no mercy when it comes to expectations for cloud applications. Bigleaf addresses this through automatic inspection of real-time network performance. Our software evaluates the health of each ISP path ten times per second and then based on that data, immediately changes internal routing behavior to keep applications and users happy. You can think of it like the perfect network engineer who has instant access to the health statistics of the entire internet path and who makes routing policy changes immediately with no errors or bathroom breaks or sick days.

Flexibility

One big issue that manual network polices have is that they are rigid and aren’t aware of new needs and applications. For example, if a company’s QoS policies were put in place a couple of years ago, they have no configuration for Microsoft Teams and the way that Teams traffic should be identified and prioritized. What happens if the company’s CEO gets invited to a phone conference on Teams tomorrow?

At Bigleaf, we don’t want our users to have to think about how the network will handle new applications. Frankly, everyone wants their voice calls to sound clear, their video to play smoothly, their web applications to be snappy, and their databases to be reliable. We’ve built smart software that auto-detects every type of application, classifies them into six priority categories, and automatically ensures they behave how they’re supposed to, even when the network is congested. Our users don’t need policies to get the outcome they want. It’s wonderful.

Autonomy

I’ve been talking about intelligent software, but what do I mean by that? At the core of it, is that the software can make the same decisions that a human would, just better, and faster. This releases people from manually controlling the low-level details of a technology, such as a network routing policy, while creating an outcome of effectively more control for people than was ever possible before.

Notably, the sweet-spot of autonomy sits on a continuum of completely manual policies all the way to full autonomy. That sweet spot is where the most effective control is realized. It’s the best possible outcome. Depending on the technology we’re talking about, and the maturity of intelligent software in that industry, the sweet spot will sit in a different place along the continuum. For example, if I want to get in a car and drive around town in the middle of a blizzard, I’m not aware of any autonomous software that will do that for me yet. That’s going to be a fully manual process right now. There are a lot of smart people working on that challenge, but there are a ton of variables and complex decisions to make, and people are better at it right now.

However, for network traffic management, it’s a different story. Compared to humans, computers with dedicated intelligent software can do a much better job at making constant real-time network measurements and decisions. If implemented properly, extensive autonomy is ideal in the network. Bigleaf provides this, today.

There are still limitations even in this area though, so full autonomy isn’t yet reasonable. Bigleaf has some configuration options for how traffic is managed, but we generally look at any needed manual configuration as a shortfall, and so we incorporate that logic into the software over time. Our software will continue to mature, and the sweet spot will keep moving closer to full autonomy.

Control Realized

When New Seasons deployed our SD-WAN platform built using intelligent software, they didn’t configure any policies in it, but they got the exact control of their cloud connectivity that they had been wanting. All of their applications stayed up, and all traffic was prioritized for the best possible end-user experience.

You can think of the cloud migration kind of like “bring your own device”. Just like embracing BYOD allowed users to work wherever and whenever they wanted, fully embracing this cloud migration through an intelligent network will give businesses a competitive edge.

User Expectations in the Cloud Era

Network Challenges

Intelligent Network Software

Impact

Reliability

Flexibility

Autonomy

Control Realized

You may also like

Open Platform for Enterprise AI (OPEA) aims to foster collaboration in Enterprise AI

Why AWS backs Valkey, an open source alternative to Redis | David Nalley

LF Energy leads digitalization efforts to tackle decarbonization challenges

Carbon Data Specification Consortium helps drive climate solutions with carbon data standardization

Tackle data complexity with Hasura v3

Acorn Labs’ GPTScript aims to redefine coding for AI applications