<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=2877026&amp;fmt=gif">

Hue 4.5.0 with HDP 3.1.4 — A Game of Latest Technologies

By Ajit Mote - January 21, 2020

Configure the latest open source free version (4.5.0) of Hue with the latest HDP (3.1.4) stack

Introduction

Everybody likes to explore and try out the latest and greatest technologies. But, getting hold of these technologies comes with both frustration and pleasure. In my scenario, for some unknown reason, I picked the latest Hue source code from the hue git repository and attempted to make it work for HDP 3.1.4 stack. Hortonworks does not officially support Hue, so it was quite challenging to start from nowhere. This blog is a summary of my learnings from this entirely interesting and challenging task.

I planned this activity as below:

  1. Get the latest Hue from gethue.com

  2. Install it

  3. Pick hue.ini from one of our other working setups and modify it as per the new cluster nodes hostname

  4. Start hue service

  5. Handover setup to end-users for testing

  6. Go home in the evening and sleep tight

But as all administrators know, nothing goes as planned and without new learnings.

1) Get and install the latest Hue

yum install -y git
git clone https://github.com/cloudera/hue.git
sudo yum install -y ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi 
cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel 
libxslt-devel make mysql mysql-devel openldap-devel python-devel
sqlite-devel gmp-devel libtidy
maven
cd hue
make apps 

2) Create a hue user and database in MySQL/MariaDB

create database huedb;
create user ‘hue’@’localhost’ identified by ‘PASSWORD’;
grant all privileges on huedb.* TO ‘hue’@’localhost’;
grant user ‘hue’@’%’ identified by ‘PASSWORD’;
grant all privileges on huedb.* TO ‘hue’@’%’;
create user ‘hue’@’HOSTNAME_HUE_SERVER’
identified by ‘PASSWORD’;
grant all privileges on huedb.* TO
‘hue’@’HOSTNAME_HUE_SERVER’ identified by
‘PASSWORD’;
flush privileges; 

3) Create the hue user on the hue server

useradd hue

4) Update “desktop/conf/pseudo-distributed.ini” file to make it listen on 8000 and use MySQL/MariaDB database

a) Where hue process will listen for request
[desktop]
http_host=<HUE_SERVER_HOSTNAME>
http_port=8000
b) use remote MySQL/MariaDB database
[[database]]
engine=mysql
host=<MYSQL_DB_HOSTNAME>
port=<MYSQL_DB_PORT>
user=hue
password=<hue-PASSWORD>
name=hue 

5) Configure Hue to load the existing data and create the necessary database tables

build/env/bin/hue syncdb — noinput

6) Start the Hue process and Test

a. Start hue process:
build/env/bin/supervisor
b. Verify whether all goes well :
i. Open the browser and point it to http://<HUE-HOSTNAME>:8000
ii. On the hue system, check the output of “netstat -tulpn | grep 8000”
command. Here, you should see that the hue process is listening on 8000 port for connection.

If you get a login prompt, it means the hue configuration is proper and that it’s properly installed.

Hue Login screenHue Login screen

7) Configuring Hue to access Hadoop cluster HDFS, Yarn and Hive service

a. Yarn service

open hue configuration file and update below configuration:
[hadoop]
[[yarn_clusters]]
[[[default]]]
resourcemanager_host=<RM_HOST>
submit_to=True
resourcemanager_api_url=http://<RM_HOST>:8088
proxy_api_url=http://<RM_HOST>:8088
history_server_api_url=http://<RM_HOST>:19888

Note: Verify the port for each of the above services through Ambari and update them accordingly. In Yarn HA, a request to any yarn instance will be redirected to an active resource manager.

Job BrowserJob Browser

b. Hive service

For Hue to integrate with Hive:

a) I made the below configuration change in hive service :
hive.server2.transport.mode=http
The transport mode is binary by default.
b) Open the hue configuration file and update the below configuration :
[beeswax]
hive_server_host=<HIVE_SERVER_HOSTNAME>
hive_server_port=10001
hive_discovery_hs2 = true
hive_discovery_hiveserver2_znode = /hiveserver2
hive_conf_dir=/etc/hive/conf
server_conn_timeout=120
use_get_log_api=false
max_number_of_sessions=2
[zookeeper]
[[clusters]]
[[[default]]]
host_ports=<ZK_HOST1>:2181,<ZK_HOST2>:2181,<ZK_HOST3>:2181

You can get the above string from Ambari under Hive service as well.

Hue Hive QueryHue Hive Query

Learn more about bucket map join in Hive with our blog here.

c. HDFS service

In our cluster, we have HDFS high availability configured. So we can not just put one NameNode entry in hue configuration as it does not redirect the request to an active NameNode. For this to work, you have to configure Hadoop-HttpFS service.

[hadoop]
[[hdfs_clusters]]
[[[default]]]
fs_defaultfs=hdfs://<ADDRESS_OF_NAMENODE>
webhdfs_url=http://<HOSTNAME_hadoop-httpFS>:14000/webhdfs/v1/

You can get ADDRESS_OF_NAMENODE from fs.defaultfs property in the advanced core-site file.

Hue File BrowserHue File Browser

Below are frustrating moments which might test your patience if not done properly:

  1. Dependencies during build from source. You may have to install a few other dependent packages other than the ones mentioned above.

  2. Make sure the “hue” user is created on the hue server system.

  3. Hive service transport mode. Initially, I spent a lot of time making it work for the binary mode, but I was not lucky.

  4. Install the Hadoop-HttpFS service in case you have the HDFS HA setup

  5. Enable user impersonation for hue. If these configurations are not present, impersonation will not be allowed, and the connection will fail

As Hortonworks does not officially support it, the hue process can not start automatically as part of the HDP stack. So I have written a systemctl script for hue and Hadoop-HttpFS service.

d) Hue service

  1. Create hue.service in /etc/systemd/system directory :

    [Unit]
    Description = Start Hue service
    After = network.target
    [Service]
    Type=simple
    Restart=always
    ExecStart = /home/hue/hue/build/env/bin/supervisor -d
    [Install]
    WantedBy = multi-user.target 
    
  2. systemctl enable hue.service

  3. systemctl daemon-reload

  4. Now you can start, stop and check the status of hue service using the systemctl command

e) hadoop-httpFs service

  1. Create hadoop-httpfs.service in /etc/systemd/system directory :

    [Unit]
    Description = Start Hadoop HttpFS service
    After = network.target
    [Service]
    Type=forking
    Restart=always
    User=hdfs
    ExecStart = /bin/hdfs — daemon start httpfs
    [Install]
    WantedBy = multi-user.target 
    
  2. systemctl enable hadoop-httpfs.service

  3. systemctl daemon-reload

  4. Now you can start, stop and check the status of Hadoop-HttpFS service using the systemctl command

    To get the best data engineering solutions for your business, reach out to us at Clairvoyant.

Author
Ajit Mote

Tags: Data Engineering

Fill in your Details