CNME Editor Mark Forker secured an exclusive interview with Kevin Kline, Senior Staff Technical Marketing Manager at SolarWinds, in an effort to find out how their suite of solutions are empowering users to overcome database problems, the importance of effective cost management when migrating to the cloud – and the key fundamentals needed in data recovery.
Kevin Kline is a software industry veteran and has enjoyed a remarkable career, one thus far that has spanned 35 years.
In his earlier days he worked as an Oracle developer on projects with NASA, and has also worked with Amazon Web Services, Deloitte & Touche and Quest Software in what has been a storied journey to date.
For the last 11 years he has spent 10 of them at SolarWinds, and has been in his new position as Senior Staff Technical Marketing Manager at the company for the last 4 months.
We began the conversation by focusing on the tools and solutions being provided by SolarWinds to help enterprises determine whether the issue they are encountering is a database problem, or something else.
As Kline highlighted this is an old age problem that is still prevalent in the industry, but he says new technologies have helped IT professionals move away from the ‘blame game’ culture that existed.
“I started my database administration career back in 1994, and back then we had robust monitoring systems. I would typically get a call from an end-user, or a program manager to say that their database system is down, and that I needed to get in there and fix it. So, I would log-in and connect, and I could quickly see that there was no issue with connecting to the database. I would relay that back to the program manager and say everything is good here, it is answering my queries, but they would say, no that can’t be right, it’s definitely a database problem. What would then ensue would be a blame game until we discovered what the root of the issue was,” said Kline.
Kline said it was common place for his database team to be the first ones contacted when there was an issue, but he said that the nemesis in their organisation was the networking team.
“The problem would be a router that dropped somewhere, and that person couldn’t connect to the server, but the server was still up and running. That was the scenario in the good old days, but it actually served as an inspiration for me to work with a tools company because I knew we could move away from the blame game culture that existed – and use empirical evidence that shows exactly where the problem is,” said Kline.
Kline said that when it came to resolving a particular issue the mantra adopted by SolarWinds was a good, better and best approach to overall monitoring.
“There are many companies that don’t have a large team, so in that instance they could use a couple of our products, the first of which is our new SolarWinds Observability, which is an all-cloud product, or they might want to use one of our Server and Application Monitoring (SAM) product. Those two products give you full stack observability. For the majority of our customers that use those products database is not the main focus of their IT efforts, we all have databases, but it isn’t their primary focus,” said Kline.
Kline highlighted the products required for businesses with critical databases and for databases not performing well, or are consistently crashing.
“We offer our Database Observability (DBO) platform for SaaS vendors, and we have two products that are specific to personas. What we mean by that is over time people have always wondered is the database administrator going to be a permanent position, or is it going to go away at some point in the future. If you’re a highly skilled database administrator at a company that knows its systems are mission-critical, and we’re going to lose hundreds of thousands of dollars per-hour if our database is slow, or not responding. That is when you want to use SQL Sentry, it’s extremely deep and powerful and can alert you on things that nobody else in the whole field can, no competitors can get near it. We also have a product called DPA (Database Performance Advisor) and what this does is work across all the major relational databases and this is probably by license count our most popular database product,” said Kline.
Kline also pointed out how DPA focuses on an element that all the relational databases have called wait statistics.
“Wait statistics in Oracle, or SQL are all conceptually the same thing, which is what is waiting for you in your database? So, if we need to solve a problem on our database, you can go into DPA and you can see we are waiting on the SAN, something is going on with our SAN, we can see which queries are hitting the SAN – and then you can go a level deeper and troubleshoot that at the SAN level. It really removes again this culture of the blame game that existed so much in the past,” said Kline.
The demands of moving workloads to the cloud have been a big issue for many enterprises across the Middle East, but Kline said the problem is not isolated to the Middle East, and said it is a big challenge in the US too.
According to Kline the biggest issue for enterprises moving to the cloud is readiness.
“Last year, I wrote a book about migrating databases to Azure SQL called Professional Database Migration to the Azure SQL Cloud, so I consider myself pretty well-versed when it comes to this particular topic. I also spent a year at AWS, where my main job was to help customers migrate to the cloud. One of the biggest things that we see is customers are just unprepared for the transition to the cloud. Essentially, what that means is they have spent years managing their databases on a yearly budget, which means they have to purchase new servers every couple of years. However, that has a very high and heavy cost because a large enterprise might spend $3m on new servers, but it’s a sort of ‘one and done’ transaction and it is quickly out of their minds. Once you decide to move to the cloud the bill doesn’t show up every three years, it shows up every month and you see what your paying. Many managers have not readjusted, or spent time studying how is this going to be different in terms of billing and consumption,” said Kline.
Kline said that traditionally servers were bought for the peak load, but an interesting conversation with fellow database peers at one of the biggest floral companies in the United States, showed how cost-effective switching to the cloud can be.
“I met with some friends of mine who were database administrators for one of the largest floral companies in the countries. I spoke with them before they moved to the cloud in-person and they had spec’d out servers that were extremely expensive. I said it must be fun working on a really big project like that, but to my surprise they said it was boring, because as a floral company they are only really busy the month before Mother’s Day, and the month before Valentine’s Day. Now with the cloud they spent a lot of time thinking about what their monthly costs will be, what they realised was that in the other 10 months of the year they spend 10% as much as that peak performance. They have had a very successful deployment because they identified that elasticity is a knob they could turn on, and add more servers and processing power when they need them,” said Kline.
Kline stressed that one of the key things SolarWinds does to help companies migrate was an increased focus and understanding of their performance workloads.
“The first essential component of a successful migration is to understand what your performance workload is like on-prem, so you can compare and see what it is like on the cloud. For example, say you are spending $8,000 per-month on database resources, by calling our support line we can project and predict based on their history how much they are going to spend if you go to Azure, or AWS, and that is very powerful,” said Kline.
Kline also highlighted that another issue facing companies was the traditional privileges and access given to the dev team.
“The companies that are guilty of deploying to the cloud without undergoing stringent analysis are used to their developers having rights to high-level privileges to their in-house databases and servers, and they perpetuate that into the cloud. However, what they find is that you only pay for what you consume, but if you let the developers consume all kinds of things without that top-down supervision then that first bill could be surprisingly higher that you may expect. We help them see the overall consumption as well as predicting what the consumption will be and help them control costs in that way,” said Kline.
Kline advanced the conversation by highlighting the need to follow a strict set of fundamental principles when recovering lost data.
“What we have found is that some very large companies in the US and globally, I mean household names are simply not prepared. I have spoken to their dev teams and discovered that they don’t have a written and practiced disaster recovery plan. We always start with the fundamentals, which is do you have an SLA as a database administrator, or a dev team for your application. How long do you guarantee your users are going to have to wait before you recover,” said Kline.
Kline then outlined the importance of two other key metrics, which is RPO and RTO.
“RPO is recovery point objective and RTO is recovery time objective. One example from my past experience is we had an application in the 90s that earned us between $100-200K an hour, more than any of us made in a year by a long shot. The recovery time objective was if the server crashes how much data are we willing to let die? In our case for that application, the owners of the application said we can let our users redo the last 15 minutes of work, so that told us that we had to have certain backups to make sure we never have less than 15 minutes of lost data. The recovery point objective is how long do they have to wait before the database is back up running and operating. Now that’s a challenge because if it’s an hour we could restore a database back up within an hour, but if they say it can’t be any longer than 15 minutes then that’s when you have to move from disaster recovery to high availability,” said Kline.
Kline concluded a wonderfully candid discussion by giving his main tip to companies that want to protect their key assets.
“I always recommend and constantly preach about this at conferences all over the world. If you have systems that are that important to the business, then you need to drill, drill and drill. You need to pretend that a server has gone down. How long does it take us to recover all of that data so our end-users are operational once again? I’ve had people call me to say our servers have gone down, and so have our secondary servers, and I just tell them I wish you had drilled in order to be prepared for this. It’s something that needs to change, it is imperative for your business that you drill, so when that inevitable day comes you know exactly what to do to get back in the game,” said Kline.