Page 36 - MSDN Magazine, June 2018
P. 36

AZURE DATABRICKS
Monitoring
Databricks Jobs with
Application Insights
Joseph Fultz and Ryan Murphy
We work on a team that focuses on data and analytics for large companies that want to implement or migrate their solu- tions to the cloud. These efforts come with the obvious work of optimizing and reengineering applications to various degrees to ensure they take advantage of what the cloud offers. As great as those efforts can be for the application itself, there are additional challenges for organizations just starting their cloud journey, as they must also do all of the work that goes along with extending their operational capabilities to the cloud. And, as new technol- ogies emerge and evolve, these must be folded into the existing operational infrastructure, as well. This is one of the challenges that exists with Spark- and Apache Hadoop-based solutions. Yes, Apache Ambari is there to provide a nice dashboard and has an API to expose metrics, but many organizations already have an investment in and a good understanding of other monitoring and dashboarding solutions, such as Azure Application Insights.
Imagine a WebJob that pulls messages from Azure Event Hubs, does some initial validation, and then drops them into Azure Stor- age, at which point that data is processed through several Spark jobs, as shown in Figure 1. The goal is to have a single runtime dashboard that can provide information that shows not only what’s happening, but also process- and business-specific information while it’s in flight. Additionally, it would be great to be able to track the flow of that information as a holistic process and see details on its constituent processes.
Sure, you can see default metrics for WebJobs in Application Insights and some information from the Spark jobs in Ambari— and roll them all up with Azure Log Analytics for post hoc insights. However, we don’t want to see two separate processes with four steps each. We want to see the process as a whole and we want run- time insights and runtime alerts.
In this article, we’ll walk through considerations and planning for bringing the full project together using Application Insights. Additionally, we’ll be using the Azure Databricks flavor of Spark as it has a nice set of features that help us more easily develop and operationalize our workflow.
Planning for Application Insights Telemetry
We won’t be covering core concepts, but for a primer on these con- cepts take a look through the online documentation at bit.ly/2FYOCyp. Also, Victor Mushkatin and Sergey Kanzhelev wrote a good article about optimizing telemetry data collection, “Optimize Telemetry
This article discusses:
• Planning for Application Insight telemetry
• Adding Applications Insight to Azure Databricks Clusters • Instrumenting Databricks job code
• Configuring analytics and alerts
Technologies discussed:
Application Insights, Azure Databricks
30 msdn magazine


































































































   34   35   36   37   38