Posted In: Apache, Spark
How to set up Spark on Windows
There are two options to run Spark. Run it on spark-shell or through Java Code in Eclipse
1. spark-shell – Software required
1. Install JDK 1.8
2. Download spark-2.2.0-bin-hadoop2.7.tgz
3. Download winutils.exe
4. Create /tmp/hive/ folder.
5. Add or set commands
4. Run spark-shell
Step 1 – Install JDK and set JAVA_HOME
Step 2 – Download Spark and unzip in a folder
Step 3 – Download winutils.exe and put it in /path/bin/winutils.exe
Step 4 – Create /tmp/hive/ folder. Need to run chmod for this folder.
Step 5 – Now open /path/spark/bin/spark-shell.cmd. Add following lines
set HADOOP_HOME=E:/programs/winutils
E:\programs\winutils\bin\winutils.exe chmod 777 E:\tmp\hive
Step 6 – Run spark-shell
Verify with following.
spark.range(1).withColumn("status", lit("Hello world!")).show(false)
2. Eclipse – Java code
1. Install JDK 1.8
2. Install Eclipse
3. Download winutils.exe
4. Add Maven entry for Spark
5. Run Java code
Step 1 – Install JDK and set JAVA_HOME
Step 2 – Install Eclipse
Step 3 – Download winutils.exe and put it in /path/bin/winutils.exe
Step 4 – Create Maven project and add following to pom.xml
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.2.0</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId> <version>2.2.0</version> </dependency>
Step 5 – Java code to run find counts from file
Example uses India stock market EOD price data available here
package com.example; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.SparkSession; public class App { public static void main(String[] args) { System.setProperty("HADOOP_HOME", "E:/programs/winutils"); System.setProperty("hadoop.home.dir", "E:/programs/winutils"); String logFile = "C:/Users/trupti/Downloads/cm24AUG2017bhav.csv"; SparkConf sparkConf = new SparkConf(); sparkConf.setAppName("Hello Spark"); sparkConf.setMaster("local"); JavaSparkContext context = new JavaSparkContext(sparkConf); SparkSession spark = SparkSession.builder().appName("Simple Application").getOrCreate(); Dataset<String> logData = spark.read().textFile(logFile).cache(); long num1 = logData.filter(s -> s.contains("RELIANCE")).count(); long num2 = logData.filter(s -> s.endsWith("INE027A01015,")).count(); System.out.println("Lines with RELIANCE: " + num1 + ", lines ends with INE027A01015,: " + num2); spark.stop(); context.stop(); context.close(); } }
Common errors
Solution – Download winutils and set HADOOP_HOME
17/08/27 18:27:03 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:378)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:393)
at org.apache.hadoop.util.Shell.(Shell.java:386)
Solution – Start spark context JavaSparkContext context = new JavaSparkContext(sparkConf);
17/08/27 19:37:05 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.(SparkContext.scala:376)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
Solution – Create /tmp/hive and give access \path\bin\winutils.exe chmod 777 E:\tmp\hive
caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException:
java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------;
- Apache (13)
- Build Tools (2)
- Gradle (2)
- Caching (1)
- cpanel (1)
- cURL (1)
- Database (7)
- Hibernate (5)
- Java Core (38)
- Java Script (15)
- Bootstrap (1)
- File Upload (7)
- jQuery (3)
- React (3)
- JEE (13)
- JSON (41)
- GSON (13)
- Jackson 1X (1)
- Jackson 2X (12)
- jsoniter (1)
- Logging (2)
- Apache Commons Logging (1)
- Apache Log4J (1)
- Logback (1)
- SLF4J (1)
- MongoDB (1)
- OS (1)
- Linux (1)
- Security (5)
- Server (4)
- Tomcat (4)
- Service (2)
- Micro (2)
- Spring (46)
- Pattern (2)
- Spring Boot (20)
- Spring Data (4)
- Spring MVC (8)
- Spring REST (13)
- Spring Security (7)
- Testing (11)
- XML (5)
- JDOM XML Parser (1)