Skip to content

Install Spark 2.3 Locally

Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.3.0 uses Scala 2.11.

Download Spark

Link

Java

All you need is to have java installed on your system PATH, or the JAVA_HOME environment variable pointing to a Java installation.

java -version

Scala

Download the Scala binaries for windows -- you will need Scala 11.x (not 10.x or 12.x) for Spark 2.3

Test correct installation of scala:

scala -version

Set PATH for Scala if needed:

export PATH = $PATH:/usr/local/scala/bin

Test that Spark is properly installed:

./bin/spark-shell --master local[2]

On Windows, use CMD or PowerShell, not git bash

Error: Failure to locate the winutils binary in the hadoop binary path

  • HADOOP_HOME (or the variable hadoop.home.dir property) needs to be set properly.
  • Known Hadoop for Windows issue: winutils is not included in the Apache distribution

You can fix this problem in two ways

  • Install a full native windows Hadoop version. The ASF does not currently release such a version; releases are available externally. Or: get the WINUTILS.EXE binary from a Hadoop redistribution. There is a repository of this for some Hadoop versions on github.

Then

  • Set the environment variable %HADOOP_HOME% to point to the directory above the BIN dir containing WINUTILS.EXE.
  • Or: run the Java process with the system property hadoop.home.dir set to the home directory.

Explanation on Hadoop Wiki

Stack Overflow

Windows binaries for some Hadoop versions

Run Spark on the local machine

To run Spark interactively in a Python interpreter, use bin/pyspark:

./bin/pyspark --master local[2]

Or submit Spark jobs:

./bin/spark-submit examples/src/main/python/pi.py 10

Spark Installation Tutorial