February 1, 2021

Elastic APM – the key to effective DevOps

Alex Hutchinson

Software architecture has over recent years has changed quite dramatically. A piece of modern software is now the sum of many parts. The pieces of the puzzle may be many small and sometimes sporadically distributed processes. The driving force behind this change has been more efficient development. Faster and more granular development iterations, which means faster fixes and faster improvements. However, its often forgotten that this speed is dependent on the speed and accurately of application monitoring.

00:00

The sooner we know a problem…

Most DevOps efforts focus on automating elements of the application development lifecycle, with the aim of speeding up the development or testing process. The most important element of the develop – deploy – monitor – report cycle is arguably the monitoring tool. Its ability to quickly and accurately detect and report a performance issue is vital. Even if we have an entirely frictionless testing and deployment process and and awesome team of developers, if there are delays in the feedback loop the fix will still be slow.

Close the DevOps feedback loop

Surprisingly often DevOps processes end with a release to production. The problem reporting responsibility is then passed to customer facing teams who handle customer issues, and manually create bug tickets. APM has an essential role in providing the same feedback from production environment as from development and test environments. This allows us to proactively spot performance issues before they occur. Arguably APM is most valuable in production.

Elastic APM solves the microservice monitoring problem

A criticism of microservice architecture has long been that it is very difficult to monitor the application as a whole, as each service has its own logging mechanism. So, is it enough to simply aggregate logs to one UI for analysis? Well, that’s certainly an important step, but how can we sew together a myriad of events from many different processes into one coherent event chain? Elastic APM instrumentation helps fix this issue. Through ECS (Elastic common schema), and APM Distributed Tracing, event log formats are normalised and homogenised upon arrival to the Elastic cluster. Also, APM pieces together all events relating to a single HTTP request as it threads its way through your code, then out to other services, and then makes its way back as a HTTP response. Elastic APM does this out of the box for supported stacks, with no further configuration needed. The root cause of slow response times can be quickly analysed.

The bits between the bits

What about if the problem is not your software. The logs look fine, but everything just feels sluggish. An inherent aspect of APM Distributed Tracing is Latency Tracking. This aims to fix troublesome, transient slowness by analysing the latencies between microservices. In doing so you gain invaluable performance data that helps you analyse the root cause. It may be a pod network, subnet, host rack location, or any number of network appliances sitting in between, but it isn’t an application problem.

This level of accuracy in pinpointing the problem potentially saves you money in wasted development time trying to fix a non-existent application problem.

Prediction is the new Detection with APM Machine Learning

Machine learning and APM is perhaps a match made in heaven. Perhaps the simplest form of ML, unattended learning, relies on two things in order to be effective. Firstly, the algorithms require vast amounts of data in order to build a meaningful model. Secondly it demands that we are open minded about what the model finds as significant.

In most cases APM generates a lot of data, and the subtle patterns that take place prior to a performance bottleneck may be very hard to define. We may suspect, but we often have no idea where or what the problem is. This makes building an Attended Learning model difficult.

By utilising Elastic’s build-in machine learning capabilities, we can use the ML models to potentially even spot an issue before it happens. Perhaps peek loads are not predictable or linked to specific events. ML could potentially spot the pattern of events or performance leading to an application peak load. With some ninja scripting, more infrastructure resources could be assigned based on the model’s alerts.

So there’s a problem, what then?

If there is an internal application performance issue, that is not helped simply by providing more infrastructure punch power, then it’s time to immediately inform the development team. With Kibana alerts we can easily send out intelligent alerts to any endpoint offering a REST API, or simply via email.

By analysing APM data we will know whether we are dealing with an infrastructure problem or a software problem, whether is a backend issue or database issue, even which backend microservices are to blame. Getting the alert to the right people means a quicker fix.

Learn more: sofecta.com/application-performance-management

Written by: Alex Hutchinson Certified Elastic Engineer and Software Architect

Related news

Introducing advanced threat intelligence – “SOC Assistant”

13.4.2021

There are many SIEM and EDR platforms in the marketplace to choose from, but they all share some common critisicms. One of those criticisms is the number of false positive alerts. Reducing false positives

A Sneak Peak at Elastic Security 7.11

26.1.2021

With each new Elastic stack version over the last year has come a generous package of Elastic Security detection rules. This demonstrates Elastic’s commitment to continually add new protections to

How Elastic security aims to protect you from the SolarWinds SUNBURST compromise

16.12.2020

Elastic protects you from the SolarWinds SUNBURST compromise

Top 5 favorite new features in Elastic 7.10 and why you should be excited too!

16.11.2020

Elastic has released the update 7.10 and here are our top 5 favorite things about it.

Is your IT infrastructure protected from COVID-19?

22.10.2020

Your business is under attack from COVID-19 in more ways than just a drop in customer activity. There has been a sharp rise in COVID-19 related Cyber Security risks and

WEBINAR Re-cap: EDR – The Perception of Value

12.10.2020

Webinar on the perceived value of EDR by Sofecta’s Alex Hutchinson.

Holistic Visibility with Elasticsearch

6.10.2020

Let’s talk visibility for a moment. Security visibility is a data-at-scale problem. Searching, analyzing, and processing across all your relevant data at speed is critical to the success of your team’s ability

WEBINAR: EDR – The Perception of value

5.10.2020

Did you miss the webinar? Don’t worry, click here to re-watch it and remember to follow us on LinkedIn to stay updated on the next webinar.

Elastic Security 7.9 adds anti-malware, prebuilt cloud protections, and more

20.8.2020

Security teams frequently use Elastic Security to collect and analyze endpoint data, and the need to enhance visibility as the virtual workforce grows is making this use case even more

Elastic APM – the key to effective DevOps

Alex Hutchinson

Close the DevOps feedback loop

Elastic APM solves the microservice monitoring problem

The bits between the bits

Prediction is the new Detection with APM Machine Learning

So there’s a problem, what then?

Share:

Related news

Introducing advanced threat intelligence – “SOC Assistant”

A Sneak Peak at Elastic Security 7.11

How Elastic security aims to protect you from the SolarWinds SUNBURST compromise

Top 5 favorite new features in Elastic 7.10 and why you should be excited too!

Is your IT infrastructure protected from COVID-19?

WEBINAR Re-cap: EDR – The Perception of Value

Holistic Visibility with Elasticsearch

WEBINAR: EDR – The Perception of value

Elastic Security 7.9 adds anti-malware, prebuilt cloud protections, and more

Links

Contact