Last Tuesday, I participated in an online panel on the subject of Ops Monitoring and Continuous Delivery as part of Continuous Discussions (#c9d9), a series of community panels about Agile, Continuous Delivery and Devops. Watch a recording of the panel:
Continuous Discussions is a community initiative by Electric Cloud, which powers Continuous Delivery at businesses like SpaceX, Cisco, GE and E*TRADE by automating their build, test and deployment processes.
The questions covered in this segment included: What are the benefits – and challenges – of “shifting left” to introduce monitoring earlier in the software release process? What should organizations measure and should they use the data they collect?
Below are a few insights from my contribution to the panel:
What should you measure? Infrastructure? App Usage?
“There are many things that can be measured, but for application governance, monitoring business transactions across the entire app delivery chain is the only way to track end-to-end app performance and UX. I believe that left-shifting business transaction monitoring/APM in the development lifecycle actually serves as an early warning system for both DevOps and business teams, to understand what the dependencies are, and understand operational or quality issues before the app goes live. You create transaction metrics and then you can use those to establish KPIs, against which the production environment can be measured. It also creates a tighter feedback loop so that everyone understands who the app is serving, what it’s meant to do, and changes can be made on a more continuous basis.
“The bottom line is the successful completion of the business transaction because if that doesn’t happen the business gets hurt, IT productivity suffers – business transactions are the common language that IT, ops and business can all understand, and that’s what makes it the most critical metric for business success.”
Where should you measure, “here” or “there”?
“Here there and everywhere in between. I lean more towards the business logic, because at the end of the day the transactions are what make operations run. In a “software-defined business” the transactions are executed in the applications. The apps start in dev and run through to production. The bottom-line metric is whether the transaction completes, and that’s how you’re gauging performance and UX, so you have to monitor the entire app delivery chain from pre- to post-production.”
Is it better to have a holistic system that doesn’t get to the level of detail you want but gives you a consolidated view of your metrics?
“The answer is both. You need a higher level view to give you operational intelligence across the entire chain. It also simplifies life, cuts down on consoles and false positives, and improves productivity. But you need to know, bottom-up, what are the metrics? The metrics can vary by app and use case, even by user, and you need to allocate resources accordingly.
“Twenty percent of the apps are driving 80% of the transactions. So for those 20% of the apps you want this higher level strategic view, but you also need to identify the key metrics. So you have to be pragmatic because resources are limited. You need to take a matrix approach, apps vs. use cases, with a holistic umbrella, so that you’re focusing on that 20% that’s providing 80% of the transactions. And you’d better know what the metrics are for those apps, because they catch the C-level attention when something goes wrong.
“You can shrink the number of tools and focus on the apps and use cases that are really driving the business. You’ll also be more popular with the business teams and you’ll get more application governance, you’ll be more service-oriented towards the business. At the end of the day you’re measured by uptime. If the transaction completes you’ve done your job, you’re an unsung hero. But if the transaction doesn’t complete because there’s a snafu somewhere along the chain, you’ll hear about that disproportionately. So you’ve got to know what metrics(drive the business) and use the matrix approach to try to fit it within a strategic platform.”
How should you use the information you collect?
“You use the information by putting it into a correlations and analytics engine. You collect all this data, and this is the big data connection – these integrated platforms. If the platform doesn’t have a correlation and analytics engine, you’re collecting a lot of data but you can’t do much with it. It’s very important for the platform to have packet-level deep-dive correlation and analytics so you can start to flesh out what are the key metrics that are driving performance and UX for the key 20% of apps, how do you isolate and correlate those.
“Some platforms are even starting to introduce predictive components, so you can automate the back-end of it, the infrastructure, but then you’ll get an interesting look into what is happening on the pre-production side. You know your infrastructure, you know how it’s performing, and the engine tells you how these variables correlate. It’s not another potpourri of tools, this is something that has to be built into the platform.”