Steps to Install Apache Kafka on the Cloudera Quickstart VM
Cloudera, one of the leading distributions of Hadoop, provides an easy to install Virtual Machine for the purposes of getting started quickly on their platform. With this, someone can easily get a single node CDH cluster running within a Virtual Environment. Users could use this VM for their own personal learning, rapidly building applications on a dedicated cluster, or for many other purposes.
Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation, written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a “massively scalable pub/sub message queue designed as a distributed transaction log,” making it highly valuable for enterprise infrastructures to process streaming data.
The Cloudera Quickstart VM doesn’t come with Apache Kafka right out of the box. But can be installed fairly easily.
Installation Steps
1. Download and Install the VM
a. Navigate to https://www.cloudera.com/downloads/quickstart_vms.html
b. Select the Platform you’d like the VM to run on and Download
c. Load the VM into your desired Platform
2. Configure the VM
Before starting the VM, set the following configurations:
— Set at least 8GB of RAM
— Set at least 2 CPUs
3. Startup the VM
4. Startup Cloudera Manager (CM)
Once the VM starts up, navigate to the Desktop and Execute the “Launch Cloudera Express” script.
Note: This may take a while to run
Once complete, you should now be able to view the Cloudera Manager by opening up your web browser (within the VM) and navigating to:
http://quickstart.cloudera:7180
From your local machine, you can navigate to:
http://localhost:7180
Default Credentials: cloudera/cloudera
5. Configure CM to use Parcels
Navigate to the Desktop and Execute the “Migrate to Parcels” script.
Note: This may take a while to run
You can validate that CM is now using parcels by logging into the Cloudera Manager Web UI. Right next to the cluster name, it should say: (CDH x.x.x, Parcels)
Note: All the services will be shut down after this and you will need to restart all the services on the cluster after this:
i. Restart the Cluster Services
— Select Clusters > Cloudera QuickStart
— Select Actions > Restart
6. Select the Version of Kafka you want to Install
Navigate here to get a full list of the Kafka Versions that are available:
https://www.cloudera.com/documentation/kafka/latest/topics/kafka_packaging.html#concept_fzg_phl_br
Note: Apache Kafka 4.x is not supported on the latest version of the Quickstart VM. Please use Apache Kafka 3.x.
Select the Parcel URL
Copy the Parcel URL next to the version of Kafka that you want (To be referred to as PARCEL_URL in future sections)
7. Install Kafka Parcel
Complete Documentation on how to manage Parcels:
a. Log in to the Cloudera Manager Web UI
b. Navigate to Hosts -> Parcels
c. Click Configuration
d. Add the PARCEL_URL you found in the previous step to the list under Remote Parcel Repository URLs
e. Save Changes
f. You will be taken back to the Parcels page. Wait a few seconds and the version of Kafka that you entered should be added to the list.
g. Locate the Kafka parcel from the list
h. Under Actions, click Download and wait for it to download
i. Under Actions, click Distribute and wait for it to be distributed
j. Under Actions, click Activate and wait for it to be activated
8. Install Kafka Service
a. Log in to the Cloudera Manager Web UI
b. Click on the button next to the Cluster Name and select “Add Service”
c. Select “Kafka” and click “Continue”
d. Select whichever set of dependencies you would like and click “Continue”
e. Select the one instance available as the Kafka Broker and Gateway and click “Continue”
f. Keep the default configurations and click Continue
Note: When you’re on the Configuration screen, ensure that the “Kafka Broker TLS/SSL Server JKS Keystore File Location” and “Kafka Broker TLS/SSL Server JKS Keystore File Location” are not auto-populated with some values. These fields should be blank.
g. The service will now be added and then you will be taken back to the CM home
9. Configure Kafka Service
Note: you will see that the Broker goes down at first. This is due to some incorrect default configurations that cannot be set until after the Kafka Service has been added.
a. Log in to the Cloudera Manager Web UI
b. Click on Kafka -> Configuration
c. Set Configurations:
-
Java Heap Size of Broker (broker_max_heap_size) =“256”
-
Advertised Host (advertised.host.name) = “quickstart.cloudera”
-
Inter Broker Protocol = “PLAINTEXT”
d. Click Save Changes
e. On the top of the page, click on the Yellow Restart button
Testing
Smoke Test
kafka-topics --zookeeper quickstart.cloudera:2181 --create --topic test --partitions 1 --replication-factor 1 kafka-topics --zookeeper quickstart.cloudera:2181 --list # Run the consumer and producer in separate windows. # Type in text to the producer and watch it appear in the consumer. # ^C to quit. kafka-console-consumer --zookeeper quickstart.cloudera:2181 --topic test kafka-console-producer --broker-list quickstart.cloudera:9092 --topic test
Also, check out our blog "Kafka — A great choice for large scale event processing" here. To get the best data engineering solutions for your business, reach out to us at Clairvoyant.