<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=2877026&amp;fmt=gif">

Installing Apache Kudu on Cloudera’s Quickstart VM

By Robert Sanders - May 2, 2019

Steps to Install Apache Kudu on the Cloudera Quickstart VM

Cloudera, one of the leading distributions of Hadoop, provides an easy to install Virtual Machine for the purposes of getting started quickly on their platform. With this, someone can easily get a single node CDH cluster running within a Virtual Environment. Users could use this VM for their own personal learning, rapidly building applications on a dedicated cluster, or for many other purposes.

Apache Kudu is a columnar storage manager developed for the Apache Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation.

The Cloudera Quickstart VM doesn’t come with Apache Kudu right out of the box. But can be installed fairly easily.

Installation Steps

1. Download and Install the VM

a. Navigate to https://www.cloudera.com/downloads/quickstart_vms.html

b. Select the Platform you’d like the VM to run on and Download

c. Load the VM into your desired Platform

2. Configure the VM

Before starting the VM, set the following configurations:

— Set at least 8GB of RAM

— Set at least 2 CPUs

3. Startup the VM

4. Startup Cloudera Manager (CM)

Once the VM starts up, navigate to the Desktop and Execute the “Launch Cloudera Express” script.

Note: This may take a while to run

Once complete, you should now be able to view the Cloudera Manager by opening up your web browser (within the VM) and navigating to:

http://quickstart.cloudera:7180

From your local machine, you can navigate to:

http://localhost:7180

Default Credentials: cloudera/cloudera

5. Configure CM to use Parcels

Navigate to the Desktop and Execute the “Migrate to Parcels” script.

Note: This may take a while to run

You can validate that CM is now using parcels by logging into the Cloudera Manager Web UI. Right next to the cluster name, it should say: (CDH x.x.x, Parcels)

Note: All the services will be shut down after this and you will need to restart all the services on the cluster after this. We will do this in a later step.

6. Remove unneeded Services (optional)

Since you are dealing with a limited amount of resources on your single node Cloudera Quickstart VM. I would recommend removing some services to ensure they don’t cause out of memory errors on the cluster.

I recommend removing the following:

Sqoop2

  • Go to Hue -> Configuration

  • Search for Sqoop

  • Set this to “none” and click Save Changes

  • Go back to the Cloudera Manager Home page

  • Click the arrow next to the service, and click Delete

Key-Value Store Indexer

  • Click the arrow next to the service, and click Delete

Solr

  • Go to Hue -> Configuration

  • Search for Solr

  • Set this to “none” and click Save Changes

  • Go back to the Cloudera Manager Home page

  • Click the arrow next to the service, and click Delete

HBase

  • Go to Hue -> Configuration

  • Search for HBase

  • Set HBase Service and HBase Thrift Server to “none” and click Save Changes

  • Go back to the Cloudera Manager Home page

  • Go to Impala -> Configuration

  • Search for HBase

  • Set this to “none” and click Save Changes

  • Go back to the Cloudera Manager Home page

  • Click the arrow next to the service, and click Delete

7. Restart Services

i. Restart the Cloudera Management Service

— Select Clusters > Cloudera Management Service

— Select Actions > Restart

ii. Restart the Cluster Services

— Select Clusters > Cloudera QuickStart

— Select Actions > Restart

8. Install Kudu Service

a. Log in to the Cloudera Manager Web UI

b.Click on the button next to the Cluster Name and select “Add Service”

c. Select “Kudu” and click “Continue”

d. Select whichever set of dependencies you would like and click “Continue”

e. Select the one instance available as the Master and the Tablet Server and click “Continue”

f. Set Configurations:

  • Kudu Master WAL Directory = /var/lib/kudu/master/wal

  • Kudu Master Data Directories = /var/lib/kudu/master/data

  • Kudu Tablet Server WAL Directory = /var/lib/kudu/tablet/wal

  • Kudu Tablet Server Data Directories = /var/lib/kudu/tablet/data

  • The service will now be added and then you will be taken back to the CM home

9. Update the Replication Factor of Kudu

  1. Go to Kudu -> Configuration.

  2. Change default_num_replicas parameter to 1.

  3. Restart the Kudu Service.

10. Enable Impala to use Kudu

a. Log in to the Cloudera Manager Web UI

b. Go to Impala -> Configuration

c. Search for Kudu

d. Set Kudu Service to “kudu” and click Save Changes

e. Go back to the Cloudera Manager Home page

f. Restart Hue and Impala by clicking the yellow restart button next to one of the services

Testing

Smoke Test

impala-shell -q 'CREATE TABLE kudu_test(id BIGINT, name STRING,
PRIMARY KEY(id)) PARTITION BY HASH PARTITIONS 2 STORED AS KUDU;'

impala-shell -q 'INSERT INTO TABLE kudu_test VALUES (1, "jack"), (2,
"jill"), (3, "bob");'

impala-shell -q 'SELECT * FROM kudu_test WHERE id=1;'

impala-shell -q 'DROP TABLE kudu_test;'

Also, check out our blog "Guide to Using Apache Kudu and Performance Comparison with HDFS" here. To get the best data engineering solutions for your business, reach out to us at Clairvoyant.

Author
Robert Sanders

Director of Big Data and Cloud Engineering for Clairvoyant LLC | Marathon Runner | Triathlete | Endurance Athlete

Tags: Data Engineering

Fill in your Details