<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=2877026&amp;fmt=gif">

Installing Livy on a Hadoop Cluster

By Robert Sanders - November 15, 2016

Steps to Install Livy on a Hadoop Cluster

Purpose

Livy is an open source component to Apache Spark that allows you to submit REST calls to your Apache Spark Cluster. You can view the source code here: https://github.com/cloudera/livy

In this post I will be going over the steps you would need to follow to get Livy installed on a Hadoop Cluster. The steps were derived from the above source code link, however, this post provides more information on how to test it in a more simple manner.

Install Steps

1.Determine which node in your cluster will act as the Livy server

  • Note: the server will need to have Hadoop and Spark libraries and configurations deployed on them.

2. Login to the machine as Root

3. Download the Livy source code

4. Get the version of spark that is currently installed on your cluster

1.Run the following command

  spark-submit — version
  • Example: 1.6.0

2. Use this value in downstream commands as {SPARK_VERSION}

5. Build the Livy source code with Maven

6. Your done!

Steps to Control Livy

Get Status

 ps -eaf | grep livy 

It will be listed like the following:

Start

Note: Run as Root

Once started, the Livy Server can be called with the following host and port:

If you’re calling it from another machine, then you will need to update “localhost” to the Public IP or Hostname of the Livy server.

Stop

Note: Run as Root

Testing Livy

This assumes you are running it from the machine where Livy was installed. Hence why we’re using localhost. If you would like to test it from another machine, then you just need to change “localhost” to the Public IP or Hostname of the Livy server.

1. Create a new Livy Session

a. Curl Command

b. Output

2. View Current Livy Sessions

a. Curl Command

b. Output

3. Get Livy Session Info

a. Curl Command

b. Output

4. Submit job to Livy

a. Curl Command

b. Output

5. Get Job Status and Output

a. Curl Command

b. Output

6. Delete Session

a. Curl Command

b. Output

{“msg”:”deleted”}

Also, check out our blog about how to install Spark2 in Cloudera Cluster through Docker here. To get the best data engineering solutions for your business, reach out to us at Clairvoyant.

Author
Robert Sanders

Director of Big Data and Cloud Engineering for Clairvoyant LLC | Marathon Runner | Triathlete | Endurance Athlete

Tags: Data Engineering

Fill in your Details