Elastic Universal Profiling (eBPF)

APM, Elasticsearch, Observability, OpenTelemetry(OTel)

Elastic Universal Profiling (eBPF)

What is Elastic Universal Profiling? Elastic Universal Profiling (EUP) is a continuous, low-overhead, production-grade performance monitoring solution provided by Elastic. It enables developers and system administrators to monitor an application’s CPU, memory, and other system resources in real time. EUP collects essential metrics without requiring code instrumentation or causing performance degradation, making it well-suited for…

Rahul Ranjan

September 27, 2024

7–10 minutes

Docker, Elasticsearch, Grafana, Kubernetes, OpenTelemetry(OTel)

What is Elastic Universal Profiling?

Elastic Universal Profiling (EUP) is a continuous, low-overhead, production-grade performance monitoring solution provided by Elastic. It enables developers and system administrators to monitor an application’s CPU, memory, and other system resources in real time. EUP collects essential metrics without requiring code instrumentation or causing performance degradation, making it well-suited for production environments.

EUP operates within the Elastic Observability Stack, which also includes Elasticsearch, Kibana, and other monitoring tools. By integrating with the broader Elastic ecosystem, EUP provides advanced visualization, search, and alerting capabilities to assist teams in monitoring and enhancing their systems’ performance.

The Elastic Universal Profiling agent is now open source under the Apache 2 license.

Key Features of Elastic Universal Profiling

Low Overhead: EUP is designed to minimize the performance impact on running applications by using lightweight sampling techniques.
Language and Platform Agnostic: EUP supports multiple programming languages, platforms, and architectures, including Java, Python, C++, and Go.
Continuous Profiling: It runs continuously in production environments, providing real-time insights into application performance.
Centralized Analysis: Profiling data is stored in Elasticsearch and visualized using Kibana.
Wide Scope Monitoring: Captures profiling data from both user-space applications and kernel-space system calls.

Benefits of Elastic Universal Profiling

Improved Performance Visibility: Continuous real-time monitoring of application performance.
Reduced Debugging Time: EUP simplifies root-cause analysis by providing detailed insights without requiring code changes.
Cost Efficiency: Helps optimize resource usage by identifying inefficient code paths or system configurations.
Cross-Team Collaboration: Facilitates collaboration between developers, SREs, and DevOps by integrating with the Elastic Stack.
Production-Grade Profiling: Minimal overhead allows profiling in live production environments without disruption.

How Elastic Universal Profiling Works

Elastic Universal Profiling operates by periodically sampling running applications, collecting stack traces, and gathering system-level information such as CPU and memory usage. The data is stored in Elasticsearch and can be analyzed using Kibana. Users can visualize, search, and query performance metrics to identify bottlenecks or inefficiencies.

Elastic Universal Profiling offers three main visualizations: Stacktraces, TopN Functions, and flamegraphs.

Stack Traces: The Foundation of Profiling

At the heart of effective profiling is the stack trace — a snapshot that captures the sequence of function calls made by a program at a specific point in time. This snapshot acts as a historical record of the call stack, showing the path the application has followed to reach its current state. It enables developers to retrace the steps leading up to a particular event or issue within the application.

Stack traces are the key data structure used by profilers to understand what an application is executing at any given moment. They offer a detailed view of the program’s execution flow, which becomes invaluable when system-level monitoring tools flag issues such as high CPU usage. For example, while tools top -H can show which processes are consuming significant CPU resources, they lack the fine-grained detail necessary to pinpoint the exact lines of code causing the bottleneck. Stack traces fill this gap, allowing developers to dive deeper into the application’s behavior and diagnose performance issues at the code level.

Stacktrace view

The stack trace view displays grouped graphs by threads, hosts, Kubernetes deployments, and containers. It’s useful for identifying unexpected CPU spikes across threads and narrowing down the time range for deeper investigation using a flame graph.

Flame Graphs: Visualizing Stack Traces for Efficient Profiling

Analyzing stack traces becomes more complex when aggregating them continuously across multiple machines. As stack depths increase and branching paths multiply, pinpointing resource-heavy code can be challenging. Flamegraphs visually represent stack traces, with each function shown as a rectangle — its width indicating time spent, and the height representing the call depth. This makes it easy to spot performance bottlenecks.

Elastic Universal Profiling uses icicle graphs, an inverted form of flamegraphs. In these, the root function is at the top, with child functions beneath, simplifying the view of function hierarchies and resource usage.

Flame Graphs view

The flame graph page is where you’ll likely spend most of your time when debugging and optimizing. It helps to identify performance bottlenecks and optimization opportunities. Focus on three key elements: width, hierarchy, and height.

Functions:

Users use Elastic Universal Profiling to troubleshoot and optimize performance. It generates stack traces from the kernel to high-level runtimes, helping identify performance regressions, reduce inefficiencies, and speed up debugging. By analyzing TopN functions and visualizing resource usage with flamegraphs, teams can quickly resolve bottlenecks.

TopN functions view

Universal Profiling’s topN functions view highlights the most frequently sampled functions, categorized by CPU time, annualized CO2, and cost estimates. This helps identify the most resource-intensive functions across your fleet. You can filter by specific components for deeper analysis, and clicking a function name takes you to the flame graph for a detailed look at the call hierarchy. You can measure the impact of the changes.

Users should be able to evaluate the performance, cloud costs, and carbon footprint of each deployed change. It is important to assess the impact of your updates after addressing performance issues. You can use the topN functions and flame graph views to identify regressions and measure improvements in performance, cost savings, and carbon emissions.

Installation and Configuration of Elastic Universal Profiling(Referring to Official Docs)

EUP supports a variety of platforms and can be installed on Linux-based systems such as Ubuntu, Red Hat, CentOS, and Docker containers. Below are the installation and configuration instructions for different platforms:

Prerequisites

Before installing EUP, ensure the following prerequisites are met:

An Elastic Stack deployment on Elastic Cloud at version 8.7.0 or higher. Universal Profiling is currently only available on Elastic Cloud.
The workloads you’re profiling must run on Linux machines with x86_64 or ARM64 CPUs.
The minimum supported kernel version is 4.19 for x86_64 or 5.5 for ARM64 machines.
The Integrations Server must be enabled on your Elastic Cloud deployment.
Credentials (username and password) for the superuser Elasticsearch role (typically, the elastic user).

Install the Universal Profiling Agent

You have two options when installing the Universal Profiling Agent:

Install the Universal Profiling Agent using the Elastic Agent

To install the Universal Profiling Agent using the Elastic Agent and the Universal Profiling Agent integration, complete the following steps:

Copy the secret token and Universal Profiling Collector url from the Elastic Agent Integration:
Click Manage Universal Profiling Agent in Fleet to complete the integration.
On the Integrations page, click Add Universal Profiling Agent.
In Universal Profiling Agent → Settings, add the information you copied from the Add profiling data page:
Add the Universal Profiling collector URL to the Universal Profiling collector endpoint field.
Add the secret token to the Authorization field.
Click Save and continue.

Install the Universal Profiling Agent in standalone mode

The Universal Profiling Agent profiles your fleet. You need to install and configure it on every machine that you want to profile. The Universal Profiling Agent needs root / CAP_SYS_ADMIN privileges to run.

After clicking Set up Universal Profiling in the previous step, you’ll see the instructions for installing the Universal Profiling Agent. You can also find these instructions by clicking the Add Data button in the top-right corner of the page.

The following is an example of the provided instructions for Kubernetes:

Visualize Data in Kibana under the Universal Profiling Section of Observability:

Elastic Universal Profiling has an awesome user interface that instantly displays the impact of any given function, including its CPU execution time, associated costs in dollars, and carbon emissions.

Profiling detail of Elastic Agent Process.

Let’s take the example of a Python-based web application running on an Ubuntu server to showcase how EUP works in real-time.

Scenario:

Your Python web application experiences performance bottlenecks. You install and configure EUP to identify inefficient code paths and system resource consumption.

Step 1: Install Elastic Agent

Follow the installation steps for Ubuntu to install the Elastic Agent with Universal Profiling enabled.

Step 2: Collect Profiling Data

EUP will start collecting data from your Python application, including CPU and memory usage.

Step 3: Visualize the Data

Use Kibana to create a dashboard visualizing CPU, memory, and stack traces. This helps pinpoint the specific sections of your code that are causing performance issues.

Step 4: Resolve Bottlenecks

Based on the insights from EUP, you discover an inefficient library in your Python application. You update the library or refactor the code to improve performance.

Step 5: Monitor the Results

Continue using EUP to monitor the application’s performance and ensure that the bottleneck has been successfully resolved.

Troubleshooting Universal Profiling Collection issues

To troubleshoot the Universal Profiling-related issue, please refer to the official troubleshooting document covering many related errors in detail, along with the solutions.

Conclusion

Elastic Universal Profiling is an invaluable tool for monitoring, optimizing, and maintaining the performance of distributed applications. Its ability to continuously profile with minimal overhead makes it ideal for production environments. By providing detailed insights into system resource usage, EUP empowers teams to detect bottlenecks, reduce costs, and improve system reliability across multiple platforms.

With installation options for Ubuntu, Red Hat/CentOS, Docker, and Kubernetes, Elastic Universal Profiling fits into a wide range of environments. Integrating it into your existing Elastic Stack infrastructure provides real-time monitoring, powerful visualization, and actionable insights to help optimize your application performance at scale.

Elastic RUM (Real User Monitoring) with Open Telemetry (OTel).

OpenTelemetry: Automatic vs. Manual Instrumentation — Which One Should You Use?

Configuration of the Elastic Distribution of OpenTelemetry Collector (EDOT)

Instrumenting a Java application with OpenTelemetry for distributed tracing and integrating with Elastic Observability

Test and Analyze OpenTelemetry Collector processing

#otel #docker #kubernetes #devops #elasticsearch #observability #search #apm #APM #grafana #datadog #ebpf #UniversalProfiling

Feel Free to Reach Out at Linkedin:

Discover more from Tech Insights & Blogs by Rahul Ranjan

Subscribe to get the latest posts sent to your email.

One response to “Elastic Universal Profiling (eBPF)”

Application Performance Optimization: How to Effectively Analyze and Optimize pprof CPU Profiles – Tech Insights & Blogs by Rahul Ranjan

November 25, 2024 at 1:51 pm

[…] Related Article : Elastic Universal Profiling (eBPF) […]

LikeLike

Reply