<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=2877026&amp;fmt=gif">

Installing Apache Zeppelin on a Hadoop Cluster

By Robert Sanders - December 2, 2016

Steps to Install Apache Zeppelin on a Hadoop Cluster

Apache Zeppelin(https://zeppelin.incubator.apache.org/) is a web-based notebook that enables interactive data analytics. You can make data-driven, interactive and collaborative documents with SQL, Scala and more.

This document describes the steps you can take to install Apache Zeppelin on a CentOS 7 Machine.

Steps

Note: Run all the commands as Root

Configure the Environment

Install Maven (If not already done)

Configure Maven (If not already done)

Note: If you were to login as a different user or logout these settings will be whipped out so you won’t be able to run any mvn commands. To prevent this, you can append these export statements to the end of your ~/.bashrc file:

Install NodeJS

Note: Steps referenced from

https://nodejs.org/en/download/package-manager/

Install Dependencies

Note: Used for Zeppelin Web App

yum install -y bzip2 fontconfig

Install Apache Zeppelin

Select the version you would like to install
View the available releases and select the latest:

https://github.com/apache/zeppelin/releases

Override the {APACHE_ZEPPELIN_VERSION} placeholder with the value you would like to use.

Download Apache Zeppelin

Get Build Variable Values

Get Spark Version
Running the following command

spark-submit --version

Override the {SPARK_VERSION} placeholder with this value.

Example: 1.6.0

Get Hadoop Version
Running the following command

hadoop version

Override the {HADOOP_VERSION} placeholder with this value.

Example: 2.6.0-cdh5.9.0

Take the this value and get the major and minor version of Hadoop. Override the {SIMPLE_HADOOP_VERSION} placeholder with this value.

Example: 2.6

Build Apache Zeppelin

Update the bellow placeholders and run

Note: this process will take a while

Configure Apache Zeppelin

Base Zeppelin Configuration

Setup Conf

Setup Hive Conf

Edit zeppelin-env.sh
Uncomment export HADOOP_CONF_DIR
Set it to export HADOOP_CONF_DIR=“/etc/hadoop/conf”

Starting/Stopping Apache Zeppelin

Start Zeppelin

/opt/zeppelin/bin/zeppelin-daemon.sh start

Restart Zeppelin

/opt/zeppelin/bin/zeppelin-daemon.sh restart

Stop Zeppelin

/opt/zeppelin/bin/zeppelin-daemon.sh stop

Viewing Web UI

Once the zeppelin process is running you can view the WebUI by opening a web browser and navigating to:

http://{HOST}:8080/

Note: Network rules will need to allow this communication

Runtime Apache Zeppelin Configuration

Further configurations maybe needed for certain operations to work Configure Hive in Zeppelin

    1. Open the cloudera manager and get the public host name of the machine that has the HiveServer2 role. Identify this as HIVESERVER2_HOST

    2. Open the Web UI and click the Interpreter tab

    3. Change the Hive default.url option to: jdbc:hive2://{HIVESERVER2_HOST}:10000

Also check out our blog Installing and Configuring Apache Airflow here. To get the best data engineering solutions for your business, reach out to us at Clairvoyant.

Author
Robert Sanders

Director of Big Data and Cloud Engineering for Clairvoyant LLC | Marathon Runner | Triathlete | Endurance Athlete

Tags: Data Engineering

Fill in your Details