Posted In: Apache, Spark

How to set up Spark on Windows

There are two options to run Spark. Run it on spark-shell or through Java Code in Eclipse

1. spark-shell – Software required

1. Install JDK 1.8
2. Download spark-2.2.0-bin-hadoop2.7.tgz
3. Download winutils.exe
4. Create /tmp/hive/ folder.
5. Add or set commands
4. Run spark-shell

Step 1 – Install JDK and set JAVA_HOME

Step 2 – Download Spark and unzip in a folder

Step 3 – Download winutils.exe and put it in /path/bin/winutils.exe

Step 4 – Create /tmp/hive/ folder. Need to run chmod for this folder.

Step 5 – Now open /path/spark/bin/spark-shell.cmd. Add following lines

set HADOOP_HOME=E:/programs/winutils
E:\programs\winutils\bin\winutils.exe chmod 777 E:\tmp\hive

Step 6 – Run spark-shell

Verify with following.

spark.range(1).withColumn("status", lit("Hello world!")).show(false)



2. Eclipse – Java code

1. Install JDK 1.8
2. Install Eclipse
3. Download winutils.exe
4. Add Maven entry for Spark
5. Run Java code

Step 1 – Install JDK and set JAVA_HOME

Step 2 – Install Eclipse

Step 3 – Download winutils.exe and put it in /path/bin/winutils.exe

Step 4 – Create Maven project and add following to pom.xml


Step 5 – Java code to run find counts from file

Example uses India stock market EOD price data available here

package com.example;

import org.apache.spark.SparkConf;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.SparkSession;

public class App {
	public static void main(String[] args) {
		System.setProperty("HADOOP_HOME", "E:/programs/winutils");
		System.setProperty("hadoop.home.dir", "E:/programs/winutils");
		String logFile = "C:/Users/trupti/Downloads/cm24AUG2017bhav.csv";

		SparkConf sparkConf = new SparkConf();
		sparkConf.setAppName("Hello Spark");

		JavaSparkContext context = new JavaSparkContext(sparkConf);

		SparkSession spark = SparkSession.builder().appName("Simple Application").getOrCreate();
		Dataset<String> logData =;

		long num1 = logData.filter(s -> s.contains("RELIANCE")).count();
		long num2 = logData.filter(s -> s.endsWith("INE027A01015,")).count();

		System.out.println("Lines with RELIANCE: " + num1 + ", lines  ends with INE027A01015,: " + num2);

Common errors

Solution – Download winutils and set HADOOP_HOME

17/08/27 18:27:03 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
	at org.apache.hadoop.util.Shell.getQualifiedBinPath(
	at org.apache.hadoop.util.Shell.getWinUtilsPath(
	at org.apache.hadoop.util.Shell.(

Solution – Start spark context JavaSparkContext context = new JavaSparkContext(sparkConf);

17/08/27 19:37:05 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration
	at org.apache.spark.SparkContext.(SparkContext.scala:376)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)

Solution – Create /tmp/hive and give access \path\bin\winutils.exe chmod 777 E:\tmp\hive

caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: 
java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------;

by , on August 28th, 2017

  • Categories