Steps to Install Apache Zeppelin on a Hadoop Cluster
Apache Zeppelin(https://zeppelin.incubator.apache.org/) is a web-based notebook that enables interactive data analytics. You can make data-driven, interactive and collaborative documents with SQL, Scala and more.
This document describes the steps you can take to install Apache Zeppelin on a CentOS 7 Machine.
Steps
Note: Run all the commands as Root
Configure the Environment
Install Maven (If not already done)
Configure Maven (If not already done)
Note: If you were to login as a different user or logout these settings will be whipped out so you won’t be able to run any mvn commands. To prevent this, you can append these export statements to the end of your ~/.bashrc file:
Install NodeJS
Note: Steps referenced from
https://nodejs.org/en/download/package-manager/
Install Dependencies
Note: Used for Zeppelin Web App
yum install -y bzip2 fontconfig
Install Apache Zeppelin
Select the version you would like to install
View the available releases and select the latest:
https://github.com/apache/zeppelin/releases
Override the {APACHE_ZEPPELIN_VERSION} placeholder with the value you would like to use.
Download Apache Zeppelin
Get Build Variable Values
Get Spark Version
Running the following command
spark-submit --version
Override the {SPARK_VERSION} placeholder with this value.
Example: 1.6.0
Get Hadoop Version
Running the following command
hadoop version
Override the {HADOOP_VERSION} placeholder with this value.
Example: 2.6.0-cdh5.9.0
Take the this value and get the major and minor version of Hadoop. Override the {SIMPLE_HADOOP_VERSION} placeholder with this value.
Example: 2.6
Build Apache Zeppelin
Update the bellow placeholders and run
Note: this process will take a while
Configure Apache Zeppelin
Base Zeppelin Configuration
Setup Conf
Setup Hive Conf
Edit zeppelin-env.sh
Uncomment export HADOOP_CONF_DIR
Set it to export HADOOP_CONF_DIR=“/etc/hadoop/conf”
Starting/Stopping Apache Zeppelin
Start Zeppelin
/opt/zeppelin/bin/zeppelin-daemon.sh start
Restart Zeppelin
/opt/zeppelin/bin/zeppelin-daemon.sh restart
Stop Zeppelin
/opt/zeppelin/bin/zeppelin-daemon.sh stop
Viewing Web UI
Once the zeppelin process is running you can view the WebUI by opening a web browser and navigating to:
http://{HOST}:8080/
Note: Network rules will need to allow this communication
Runtime Apache Zeppelin Configuration
Further configurations maybe needed for certain operations to work Configure Hive in Zeppelin
-
Open the cloudera manager and get the public host name of the machine that has the HiveServer2 role. Identify this as HIVESERVER2_HOST
-
Open the Web UI and click the Interpreter tab
-
Change the Hive default.url option to: jdbc:hive2://{HIVESERVER2_HOST}:10000
Also check out our blog Installing and Configuring Apache Airflow here. To get the best data engineering solutions for your business, reach out to us at Clairvoyant.