Q&A With Harry Aujla - Why Your Business Needs High Availability

High availability is a core component in any business continuity plan and it contributes towards maintaining a guaranteed level of service between you as a business. We recently talked to Harry Aujla of SIOS Technology, a pioneer in High Availability and Disaster Recovery. Here is the transcript of our interview.

Swapnil Bhartiya: Why do companies need high availability and what role does SIOS Production should play to help companies achieve that?

Harry Aujla: Yeah, that’s a great question, thank you Swapnil. And ultimately high availability is a core component in any business continuity plan and it contributes towards maintaining a guaranteed level of service between you as a business. And your customers, no matter how small or large or company is. Every part of the business these days needs some level of continuous operation to function in today’s demanding and high transaction business environments. So we could be talking about day to day operations, like communication via email, instant messaging, or video conferencing, all the way over to aspects such as back-office operations, such as ERP and CRM, and not to mention the importance of digital marketing channels, and eCommerce.

So what this means is that whether you’re running your applications either within your own data center, or like you say, you could be leveraging the benefits of running your applications within a cloud platform, your applications always have to be what we say, “always on.” Yeah. This dependence to be “always-on” does expose a threat in that if our applications were to become unavailable, this could impact the business in many different ways, including ways such as loss of revenue, productivity, and even reputation.

So by applying a high availability software solution, you can serve your customers through thick and thin. You are essentially sending a message to your customers that you value their business. When you’re deploying a highly available infrastructure, what you’re doing is mitigating the negative aspects, such as suffering outages, and thus, therefore, impacting aspects such as revenue and productivity. So a high availability software solution ends up being your insurance policy, whereby not only eliminates any single points of failure within your infrastructure, but is also crucial towards maintaining those important service level agreements, or SLAs, that you have in place with your customers.

Additionally, and sort of going back to the point you made earlier, Swapnil, recent events around the COVID crisis have emphasized the need for high availability and disaster recovery solutions even further. With more people now working from home, there’s been a deeper reliance on moving applications and operations onto cloud-based platforms. So what this means is that there may be more stress on IT systems and less control of those systems by IT teams that usually support them. Although the cloud providers today can offer some level of high availability for your applications, companies have to question whether these high availability capabilities go far enough in terms of meeting SLAs, particularly in an environment where many of us are working in remote or isolated fashion at the moment.

Swapnil Bhartiya: Can you kind of explain what, exactly, do you mean by a product being application-aware and what does it mean for the stack?

Harry Aujla: There are different ways you can address available when you’re looking at high availability within an infrastructure. There are different levels in which high availability can be approached. And one approach is to consider availability at a server or a cloud instance level. The notion here is that if a physical server fails, for example, or an entire cloud instance crashes, one can recover their applications and associated databases by starting up a new server or spinning up a new cloud instance.

Now, although this level of recovery is suitable for some companies, the limitation here is that this type of high availability technique doesn’t actually monitor the activity going on within the server or the cloud instance itself, with regards to pieces such as the operating system application or any associated databases, for example. So if we take a scenario where you, dread the thought, suffer an application or database level error but the server or the instance actually remained active, the server level high availability solution would not actually invoke a recovery. As from its perspective, the server or instance is actually still available.

So this approach could impact the time it takes to detect and recover from an application-level failure and therefore can impact your recovery time objective, or RTO. If companies are looking for more stringent SLAs, they need to consider availability solutions that are application and database aware, such as SIOS, which are designed to offer the ability to achieve much shorter RTOs. So the idea with application-aware availability is that if the application or database were to fail, a solution like SIOS would be monitoring this activity, and it will orchestrate the recovery of the application and database to the next available node in the high availability cluster. This instant detection of the fault and subsequent automated recovery process allows companies to minimize the time taken to recover any failed services. So it helps towards meeting those strict SLAs that they set for themselves.

To help achieve the shorter RTOs, what SIOS does is we deliver a range of what we call application recovery kits or ARKs which actually hold the necessary application and database level intelligence to be able to monitor and recover the appropriate services and resources, and also allow you to protect your infrastructure at a much more granular level compared to just monitoring a server or cloud instance.

Swapnil Bhartiya: I want to talk a bit more about, or learn more about, these application recovery kits. Can you talk about what kind of applications or infrastructure are covered by these kits? And if you can kind of give some examples.

Harry Aujla: We have an extensive list of application recovery kits that we supply as part of the SIOS solution. Including kits for pieces, such as SQL Server, Oracle database, SAP and SAP HANA and MaxDB, and a wide range of other applications, databases, and infrastructure functions. So to help you answer this particular question, I’ll use the example of MaxDB. And the reason for that is we do come across a lot of scenarios where customers want to protect MaxDB, particularly when it comes to SAP related projects. MaxDB is a common database that’s used in those environments. And for the purposes of clarity, some folks also refer to MaxDB as SAP DB as well. So we’re kind of talking about the same thing here. So the application recovery kits are really the enabler towards delivering that granular level of application protection and for want of a better term, they’re really the magic ingredient to the SIOS high availability solution.

Like the other application recovery kits that we can supply the MaxDB kit includes a variety of functionality based on our deep understanding of MaxDB itself. We’ve used that information to automate the steps, to configure a cluster to protect a MaxDB environment. So this reduces the manual scripting that makes setting up a cluster prone to human error, which frankly can be a time consuming and sometimes a pain in the neck process. It also monitors and detects application failures at a deeper level than other clustering software because it’s monitoring down through the infrastructure and operating system and other software services that could cause the application to hang or be unresponsive, even if the server or the instance is still operational. Additionally, the recovery kit will make sure that the MaxDB components are started up in the right location and in the right order. And, very importantly, in compliance with best practices. This eliminates a situation where a failover might occur, but performance on the failed over instance is either nearly zero or not stable, which is a risk you can potentially run into when dealing with a manual level of scripting.

Swapnil Bhartiya: Can you just kind of walk me through the steps of, let’s assume a situation where there’s a failover. So what are the scenarios and how the SIOS productions suite will handle those failover scenarios in the context of MaxDB and SAP DB?

Harry Aujla: So the MaxDB, or the SAP DB, recovery kit is usually deployed as a predefined GUI-based out-of-the-box wizard setup process. So we make it nice and easy. What happens is after selecting some basic options via the wizard, the recovery kit will validate your options. And this validation check is really important, as what it does, it catches any invalid entries that in some cases, flags any potential problematic configurations.

Once the validation check has passed all the necessary criteria, the recovery kit will then automatically deploy and configure the protection job around the database. Essentially, what this does is it extends the database we source from node one of the cluster to node two of the cluster, for example. The database services remain active on node one, as this is where the clients are accessing the database. The database services on node two are in a passive state until the failure of the condition is met.

So in parallel to the max DB database, we also protect the IP resources associated with the database resources, which help us redirect clients from one node to the other where necessary. And very importantly, what we do is we configure the IP resources, that dependency to the MaxDB resource. So that if MaxDB needs to failover from node one to two, for example. We also ensure that the IP resources failed over too, with the MaxDB resource.

Now in the event of MaxDB itself, let’s say, for example, a MaxDB service failed. What SIOS will try to do is it will attempt to recover that service on the same node. It will try and restart that service. If the recovery fails, it will then attempt to failover that service from node one to node two, along with any dependent resources as described earlier. If, for example, we suffer a more catastrophic failure. So let’s say for example, we suffered a complete server or node loss in the cluster. What would happen there is SIOS would automatically failover any of the protected resources that were running on node one, and it will fail them over to node two as quickly as possible.

Swapnil Bhartiya: So we’re talking a failover. When a failover happens, the IP address also changes. So how do you kind of ensure that the IP addresses are kind of managed properly after the failover so that it connects to the correct database?

Harry Aujla: So one of the recovery kits that we can supply in parallel to this wide range of application and database kits is, we have something actually called the IP recovery kit. And what this does, is this provides a mechanism to recover an IP address from a failed primary cluster node to the next available node within the SIOS protected cluster. So the IP recovery kit can define an IP address that can be used to connect to a SIOS protected application or database. And as with other SIOS protected resources, the IP resource switchovers can be initiated automatically as a result of a failure, or manually as part of administrative action.

So what we would usually do is, we would usually configure the IP resource to be a dependency under the protected application or database resource. So if any of those resources failed over, we would ensure that the associated IP resources failed over, too, so that clients can automatically connect to the secondary node post failover.

Swapnil Bhartiya: I want to go back to application awareness and its importance when the customer is choosing a failover clustering software. Talk about the significant importance and the rule that application awareness plays in disaster recovery, data recovery, high availability.

Harry Aujla: It’s very important. There’s a couple of factors that you need to consider when you’re choosing your high availability needs. And there’s an area where application-aware availability does take closer consideration. The first factor I would think about centers around the capability to deploy accurate availability configurations, which I think is really important.

The intelligence built into the recovery kits ensures that we not only protect the application and database down to that granular level in terms of resources and services, but it also ensures that the high availability protection is configured accurately. In addition to building your configuration, we spoke about the recovery kits and, in addition to building the configuration, it also validates the configuration before the protection job is actually deployed. This is really important because it ensures that it doesn’t pick up any glaring mistakes in your configuration. And it gives you the green light to continue the configuration and then deploy it in an automated fashion. So the idea is that if the configuration passes all the validation checks, the recovery kit then automatically deploys the selected configuration.

In addition to that, this process of validating and automatically deploying the configuration is beneficial as it eliminates any potential human errors that we touched upon earlier. And this is really crucial when you’re protecting highly critical environments like SAP, for example. Being application-aware also allows SIOS to monitor and protect various parts of the application database, operating system and the underlying networking. And the net result of this is not only to be able to detect a wider range of failures, but it also allows us to react to failures much more quickly.

It was touched upon earlier, the more quickly you can recover from a failure, the more likely you are to minimize those recovery time objectives and subsequently meet your service level agreements. And then lastly, I would say whatever application or database is being protected, it’s also critical that we comply with any best practices that are laid out by the application or the database vendor themselves. And only an application-aware solution is going to give you that level of capability. Again, this is super critical when it comes to things like SAP. Our ability to maintain SAP best practices during any failover or failback process allows SIOS to be certified for integration with both SAP S or HANA and SAP NetWeaver.

Harry, thank you so much for talking about the SIOS production suite, and also explaining in-depth application awareness. And I look forward to talk to you again because there are so many things to talk about in the context of high availability, data protection, and data disaster recovery. Thank you.

Swapnil Bhartiya: Thank you, Swapnil.

You may also like

Akamai’s open source project Flow-IPC solves IPC latency challenges in C++

Open Platform for Enterprise AI (OPEA) aims to foster collaboration in Enterprise AI

Why AWS backs Valkey, an open source alternative to Redis | David Nalley

LF Energy leads digitalization efforts to tackle decarbonization challenges

Carbon Data Specification Consortium helps drive climate solutions with carbon data standardization

Tackle data complexity with Hasura v3