<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=2877026&amp;fmt=gif">

Exploring Ambari Alerts in Hortonworks

By Sorabh Jain - September 1, 2020

Discussing how to extract useful reports/statistics from Ambari Alerts in Hortonworks using SQL queries

Ambari Introduction

Monitoring and Responding to problems are the two main activities that a client expects from a service provider who manages the data landscape and platform. We at Clairvoyant have worked with several clients across different industries and can help you figure out ways to identify Hortonworks services outage/alert patterns and unearth the root cause of the problems.

Let’s start by understanding what Ambari alerts are and how we should use them for efficient maintenance, management, and administration. We have developed a methodology for utilizing and storing Ambari alerts on a Hortonworks cluster for such a practice. Together, all these will help you identify regular patterns and establish the similarity between recursive errors to reach appropriate root causes.

Apache Ambari, as part of the Hortonworks Platform, allows enterprises to plan, install, provision, and securely configure HDP Cluster. This makes it easier to provide ongoing cluster maintenance, management, and administration, regardless of the size of the cluster. For further information on Apache Ambari, please refer to Hortonworks documentation for Ambari.

Ambari Alerts

Ambari Alerts serves as the central structure that stores health checks and alerts for the services on your Hortonworks cluster. As a Hadoop Administrator, you control which alerts are enabled, their thresholds, and their reporting output. Ambari automatically configures a particular set of alerts based on the services installed. For maximum flexibility, alert groups and multiple notification targets give you granular control of the Ambari alerts. This puts both flexibility and power in the hands of the Hadoop Administrator, who may now:

  • Create and manage multiple notification targets and control who gets notified for which Ambari Alerts

  • Filter notification by alert severity and send certain notifications to specific targets based on that severity

  • Control notification target methods, including support for EMAIL + SNMP so the person being notified can be alerted via their preferred method

Explore in detail about alerts at Ambari Alerts.

Insights from Ambari Alerts History

Let's look at how to extract reports from Ambari Alert History for the example mentioned below.

  1. To view how many times a Hadoop service died in the last ‘n’ days.

  2. Check how long Hadoop service was down in the last ‘n’ days.

¹ You can certainly grab this information from alert_history table via the backing Ambari database. Using the default setup, you may access it as follows from the Ambari server host:

  • “psql ambari ambari [default password is bigdata]”

¹ The Ambari table “Alert_History” has the Alert Timestamp column in BigInt. For read format, use the Psql Cast functionality. In the following example, we have used the SQL query to identify when Ambari triggers the NameNode alerts across NameNodes:

 Select TO_CHAR(TO_TIMESTAMP(Alert_Timestamp / 1000), ‘MM/DD/YYYY HH24:MI:SS’)
 From Alert_History Where Alert_Label Like ‘%NameNode%’ And Alert_State =’
 CRITICAL’ Order By 1 ASC; 

You may use the above approach to retrieve the history of alerts for a specific type.

Pre-requisites

  1. Ambari Alerts should be configured for the required services

  2. Collect the definition_id of all the required alerts from Postgres Ambari database, by querying alert_definition table:-

    a) psql ambari ambari [default password is bigdata] ( Run it from
    ambari-server )
    b) ambari=> select definition_id,definition_name,component_name 
    from alert_definition ;

1. Retrieve components down history and duration in the last ‘n’ Days

  1. Login to Ambari server

  2. Copy the ServiceDown_History.sql file to the Ambari server host under the user’s home directory

  3. Open the SQL and edit the AND alert_definition_id in from the Pre-requisites steps

  4. Run the query as follows:

    su - postgres -c "psql -d ambari -f
    <pathofsql>/ServiceDown_History.sql" > Results.csv
  5. cat Results.csv to view the desired results

2. How many times the Hadoop components went down in the last ‘n’ Days

  1. Login to Ambari server

  2. Copy ServiceDown_count.sql file to the Ambari server host under the user’s home directory

  3. Open the SQL and edit the AND alert_definition_id in from the Pre-requisites steps

  4. Run the query as follows:

    su - postgres -c "psql -d ambari -f
    <pathofsql>/ServiceDown_count.sql" > Results.csv
  5. cat Results.csv to view the desired results

From these CSV Reports produced above (Results.csv), the user will get the detailed report of the Hadoop services that occurred in the last n days/months and how long the particular service was down for. Hence, users can deduce the failure pattern and optimize the particular service to avoid future failures.

To derive the desired outcome, refer to the Ambari Alert table for more details.

To get the best data engineering solutions for your business, reach out to us at Clairvoyant.

Refer to the below git repo for more details

https://github.com/teamclairvoyant/Hortonworks-AmbariAlerts-Analyser.git

References

https://community.cloudera.com/t5/Support-Questions/How-to-view-Alert-History-via-Ambari/td-p/95042

Author
Sorabh Jain

BigData Administrator at Clairvoyant LLC

Tags: Data Engineering

Fill in your Details