Cloud Native ComputingDevelopersOpen SourceTo The Point

Importance Of Regular Testing In HA Environments

0

Brett Barwick, Principal Software Engineer at SIOS Technology, discusses the importance of regular testing in HA environments. “Always perform really rigorous pre-production testing. You want to make sure that when you hit that go-live day, that everything’s going to go smoothly,” said Barwick. He has other great tips to share too. Check out the clip above.

Guest: Brett Barwick (LinkedIn)
Company: SIOS Technology (Twitter)
Show: To The Point

[expander_maker]

Brett Barwick: Yes, absolutely. So it’s extremely important and I know customers nowadays, there’s a lot of pressure to go live, a lot of times they want to go live yesterday. So there can be some temptation to maybe cut steps here and there, but I do think it’s very important. I’ve got a few tips around this. So number one, always do really rigorous pre-production testing. You want to make sure that when you hit that go-live day, that everything’s going to go smooth. You don’t want that to be the first time that you’ve run through a failure scenario with your HA software and then on that day, you discover a misconfiguration, or you discover some behavior that you didn’t expect. So, number one, it’s really important to put the software through its paces and also your configuration through its paces.

Second thing I would recommend is making sure that you maintain, just as an IT team, you maintain a runbook, meaning a step by step guide for common failures that you might expect to come up or maintenance scenarios that you might expect to come up that has the exact steps laid out from your hardware level, your OS level, all the way up in the stack to your HA product, exactly what needs to be done to recover., Not only that, but also who’s responsible for doing those things. You don’t want something to happen in the middle of the night, it’s in one time zone, the person who’s responsible lives in another time zone. You want to make sure you know exactly who’s doing each step.

Third thing I would recommend is maintaining a QA or a test cluster that’s as close as possible to your production cluster. The idea there being that if you need to perform some maintenance, you want to make sure that you can go on to your test cluster and do essentially a dry run, make sure that all of your processes work the right way, make sure that your patches apply successfully, that way when you get on the production server and you have that scheduled maintenance window, you’re just more confident that things are going to run smoothly in the upgrade process.

And the last couple of tips I apply to HANA specifically. So number one, just keep in mind that HANA is an in-memory database. So part of the reason why it’s appealing and why it’s so fast is that it’s storing records in memory. So if, as a database admin, if you’re aware that certain tables, certain columns are frequently accessed, go ahead and configure the secondary system to preload those in memory. Just keep them loaded. That way, as soon as you have the switchover complete, as soon as you get a query, if it’s one of these commonly accessed tables or columns, you can get the data immediately. You don’t have to wait for HANA to load a terabyte of data into memory to respond to a query. So that’s the first one.

The second one is if your version of HANA supports it, so again, this would be HANA 2.0 SPS 04 later, and your HI software supports it, so if you’re using, say SIOS Protection Suite for Linux, this would be version 9.5.2 or later, consider using the takeover with handshake switchover time, because remember this can help you reduce downtime because you’re not completely stopping the primary database, you’re just suspending it, and that’s going to allow clients to connect much more quickly after the switchover.
[/expander_maker]