Steps to Install Apache Kudu on the Cloudera Quickstart VM
Cloudera, one of the leading distributions of Hadoop, provides an easy to install Virtual Machine for the purposes of getting started quickly on their platform. With this, someone can easily get a single node CDH cluster running within a Virtual Environment. Users could use this VM for their own personal learning, rapidly building applications on a dedicated cluster, or for many other purposes.
Apache Kudu is a columnar storage manager developed for the Apache Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation.
The Cloudera Quickstart VM doesn’t come with Apache Kudu right out of the box. But can be installed fairly easily.
Installation Steps
1. Download and Install the VM
a. Navigate to https://www.cloudera.com/downloads/quickstart_vms.html
b. Select the Platform you’d like the VM to run on and Download
c. Load the VM into your desired Platform
2. Configure the VM
Before starting the VM, set the following configurations:
— Set at least 8GB of RAM
— Set at least 2 CPUs
3. Startup the VM
4. Startup Cloudera Manager (CM)
Once the VM starts up, navigate to the Desktop and Execute the “Launch Cloudera Express” script.
Note: This may take a while to run
Once complete, you should now be able to view the Cloudera Manager by opening up your web browser (within the VM) and navigating to:
http://quickstart.cloudera:7180
From your local machine, you can navigate to:
http://localhost:7180
Default Credentials: cloudera/cloudera
5. Configure CM to use Parcels
Navigate to the Desktop and Execute the “Migrate to Parcels” script.
Note: This may take a while to run
You can validate that CM is now using parcels by logging into the Cloudera Manager Web UI. Right next to the cluster name, it should say: (CDH x.x.x, Parcels)
Note: All the services will be shut down after this and you will need to restart all the services on the cluster after this. We will do this in a later step.
6. Remove unneeded Services (optional)
Since you are dealing with a limited amount of resources on your single node Cloudera Quickstart VM. I would recommend removing some services to ensure they don’t cause out of memory errors on the cluster.
I recommend removing the following:
Sqoop2
-
Go to Hue -> Configuration
-
Search for Sqoop
-
Set this to “none” and click Save Changes
-
Go back to the Cloudera Manager Home page
-
Click the arrow next to the service, and click Delete
Key-Value Store Indexer
-
Click the arrow next to the service, and click Delete
Solr
-
Go to Hue -> Configuration
-
Search for Solr
-
Set this to “none” and click Save Changes
-
Go back to the Cloudera Manager Home page
-
Click the arrow next to the service, and click Delete
HBase
-
Go to Hue -> Configuration
-
Search for HBase
-
Set HBase Service and HBase Thrift Server to “none” and click Save Changes
-
Go back to the Cloudera Manager Home page
-
Go to Impala -> Configuration
-
Search for HBase
-
Set this to “none” and click Save Changes
-
Go back to the Cloudera Manager Home page
-
Click the arrow next to the service, and click Delete
7. Restart Services
i. Restart the Cloudera Management Service
— Select Clusters > Cloudera Management Service
— Select Actions > Restart
ii. Restart the Cluster Services
— Select Clusters > Cloudera QuickStart
— Select Actions > Restart
8. Install Kudu Service
a. Log in to the Cloudera Manager Web UI
b.Click on the button next to the Cluster Name and select “Add Service”
c. Select “Kudu” and click “Continue”
d. Select whichever set of dependencies you would like and click “Continue”
e. Select the one instance available as the Master and the Tablet Server and click “Continue”
f. Set Configurations:
-
Kudu Master WAL Directory = /var/lib/kudu/master/wal
-
Kudu Master Data Directories = /var/lib/kudu/master/data
-
Kudu Tablet Server WAL Directory = /var/lib/kudu/tablet/wal
-
Kudu Tablet Server Data Directories = /var/lib/kudu/tablet/data
-
The service will now be added and then you will be taken back to the CM home
9. Update the Replication Factor of Kudu
-
Go to Kudu -> Configuration.
-
Change default_num_replicas parameter to 1.
-
Restart the Kudu Service.
10. Enable Impala to use Kudu
a. Log in to the Cloudera Manager Web UI
b. Go to Impala -> Configuration
c. Search for Kudu
d. Set Kudu Service to “kudu” and click Save Changes
e. Go back to the Cloudera Manager Home page
f. Restart Hue and Impala by clicking the yellow restart button next to one of the services
Testing
Smoke Test
impala-shell -q 'CREATE TABLE kudu_test(id BIGINT, name STRING, PRIMARY KEY(id)) PARTITION BY HASH PARTITIONS 2 STORED AS KUDU;' impala-shell -q 'INSERT INTO TABLE kudu_test VALUES (1, "jack"), (2, "jill"), (3, "bob");' impala-shell -q 'SELECT * FROM kudu_test WHERE id=1;' impala-shell -q 'DROP TABLE kudu_test;'
Also, check out our blog "Guide to Using Apache Kudu and Performance Comparison with HDFS" here. To get the best data engineering solutions for your business, reach out to us at Clairvoyant.