Installing Spark2 on Cloudera’s Quickstart VM

Steps to Install Spark2 on the Cloudera Quickstart VM

Cloudera, one of the leading distributions of Hadoop, provides an easy to install Virtual Machine for the purposes of getting started quickly on their platform. With this, someone can easily get a single node CDH cluster running within a Virtual Environment. Users could use this VM for their own personal learning, rapidly building applications on a dedicated cluster, or for many other purposes.

Out of the box, Cloudera is running Spark 1.6.x. It’s a very stable and reliable version of Spark. However, Spark 2.0 released in 2016 bringing with it exceptional improvements in features and performance. In this blog article, we’ll go through step by step, how you can get Spark2 installed on your Quickstart VM.

Pre-Installation Steps

You will need to install Java 8 on the Virtual Machine for Spark2 to be supported. You can follow these instructions: Upgrading to Java8 on the Cloudera Quickstart VM

Installation Steps

1. Download and Install the VM

a. Navigate to https://www.cloudera.com/downloads/cdp-private-cloud-trial.html

b. Select the Platform you’d like the VM to run on and Download

c. Load the VM into your desired Platform

2. Configure the VM

Before starting the VM, set the following configurations:

— Set at least 8GB of RAM

— Set at least 2 CPUs

3. Startup the VM

4. Startup Cloudera Manager (CM)

Once the VM starts up, navigate to the Desktop and Execute the “Launch Cloudera Express” script.

Note: This may take a while to run

Once complete, you should now be able to view the Cloudera Manager by opening up your web browser (within the VM) and navigating to:

http://quickstart.cloudera:7180

From your local machine, you can navigate to:

http://localhost:7180

Default Credentials: cloudera/cloudera

5. Configure CM to use Parcels

Navigate to the Desktop and Execute the “Migrate to Parcels” script.

Note: This may take a while to run

You can validate that CM is now using parcels by logging into the Cloudera Manager Web UI. Right next to the cluster name, it should say: (CDH x.x.x, Parcels)

Note: You will need to restart all the services on the cluster after this. You can do this by: Going to the Cloudera Manager Web UI, click on the button next to the Cluster Name and click Start.

6. Select the Version of Spark2 you want to Install

Navigate here to get a full list of the Spark Versions that are available:

CDS Powered by Apache Spark Version, Packaging, and Download Information | 2.3.x | Cloudera…

CDS 2.3.x Powered By Apache Spark 2.3.x | Other versions

www.cloudera.com

7. Install Spark2 CSD

a. Open a Command Line Terminal

b. Login as Root

$ sudo su

c. Navigate to the CSD Directory

$ cd /opt/cloudera/csd

d. Download the CSD (Replace CSD_URL with the URL you copied from step #6)

$ wget CSD_URL

e. Set Permissions and Ownership

$ chown cloudera-scm:cloudera-scm SPARK2_ON_YARN-x.x.x.clouderax.jar
$ chmod 644 SPARK2_ON_YARN-x.x.x.clouderax.jar

f. Restart CM Services

$ service cloudera-scm-server restart

g. Login to the Cloudera Manager Web UI

h. Restart the Cloudera Management Service

— Select Clusters > Cloudera Management Service

— Select Actions > Restart

i. Restart the Cluster Services

— Select Clusters > Cloudera QuickStart

— Select Actions > Restart

8. Install Spark2 Parcel

Complete Documentation on how to manage Parcels:

Parcels | 5.16.x | Cloudera Documentation

On the Parcels page in Cloudera Manager, you can manage parcel installation and activation and determine which parcel…

www.cloudera.com

a. Login to the Cloudera Manager Web UI

b. Navigate to Hosts -> Parcels

c. Locate the SPARK2 parcel from the list

d. Under Actions, click Download and wait for it to download

e. Under Actions, click Distribute and wait for it to be distributed

f. Under Actions, click Activate and wait for it to be activated

9. Install Spark2 Service

a. Login to the Cloudera Manager Web UI

b. Click on the button next to the Cluster Name and select “Add Service”

c. Select “Spark 2” and click “Continue”

d. Select whichever set of dependencies you would like and click “Continue”

e. Select the one instance available as the History Server and the Gateway and click “Continue”

f. Leave the default configurations as is and click “Continue”

g. The service will now be added and then you will be taken back to the CM home

h. Click on the blue button next to the Spark2 service and click “Restart Stale Services”

i. Ensure the “Re-deploy client configuration” is checked and click “Restart Now”

10. Setup Integration with Hive

a. Open a Command Line Terminal

b. Login as Root

$ sudo su

d. Create a symlink in the Spark2 Configurations to the hive-site.xml

$ ln -s /etc/hive/conf/hive-site.xml /etc/spark2/conf/hive-site.xml

Testing Smoke Test

MASTER=yarn /opt/cloudera/parcels/SPARK2/lib/spark2/bin/run-example SparkPi 100

Test Spark Shell

Start the Shell

$ spark2-shell

2. Execute processes

spark
spark.sql("select 1").show()

Test PySpark Shell

Start the Shell

$ pyspark2

2. Execute processes

spark
spark.sql("select 1").show()

Test Spark Submit

$ spark2-submit --class org.apache.spark.examples.SparkPi
/opt/cloudera/parcels/SPARK2/lib/spark2/examples/jars/spark-
examples_2.11-*.jar

Test Integration with Hive

Start the Shell

$ spark2-shell

2. Execute processes

spark.sql("show databases").show()

Note: Should show at least one database: default

Documentation

Installing or Upgrading CDS Powered by Apache Spark | 2.3.x | Cloudera Documentation

Because CDS Powered by Apache Spark is only installable using the parcel mechanism, it can only be used on clusters…
Hope you found this post interesting and informative! Don’t forget to check out our Data Engineering solutions for all your business requirements.

Installing Spark2 on Cloudera’s Quickstart VM

Steps to Install Spark2 on the Cloudera Quickstart VM

Installation Steps

1. Download and Install the VM

2. Configure the VM

3. Startup the VM

4. Startup Cloudera Manager (CM)

5. Configure CM to use Parcels

6. Select the Version of Spark2 you want to Install

7. Install Spark2 CSD

8. Install Spark2 Parcel

9. Install Spark2 Service

10. Setup Integration with Hive

Testing Smoke Test

Test Spark Shell

Test PySpark Shell

Test Spark Submit

Test Integration with Hive

Documentation

Author

Fill in your Details

Related Blogs

The Impact of Data Engineering and Big Data on Education

BigQuery vs. Athena

Confluent Platform Deployment using Ansible Playbook

Partnerships

What We Offer

Know Us

Installing Spark2 on Cloudera’s Quickstart VM

Steps to Install Spark2 on the Cloudera Quickstart VM

Installation Steps

1. Download and Install the VM

2. Configure the VM

3. Startup the VM

4. Startup Cloudera Manager (CM)

5. Configure CM to use Parcels

6. Select the Version of Spark2 you want to Install

7. Install Spark2 CSD

8. Install Spark2 Parcel

9. Install Spark2 Service

10. Setup Integration with Hive

Testing Smoke Test

Test Spark Shell

Test PySpark Shell

Test Spark Submit

Test Integration with Hive

Documentation

Author

Fill in your Details

Related Blogs

The Impact of Data Engineering and Big Data on Education

BigQuery vs. Athena

Confluent Platform Deployment using Ansible Playbook

For More Blogs