Install Spark 2.3 Locally¶

Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.3.0 uses Scala 2.11.

Download Spark¶

All you need is to have java installed on your system PATH, or the JAVA_HOME environment variable pointing to a Java installation.

java -version

Download the Scala binaries for windows -- you will need Scala 11.x (not 10.x or 12.x) for Spark 2.3

Test correct installation of scala:

scala -version

Set PATH for Scala if needed:

export PATH = $PATH:/usr/local/scala/bin

Test that Spark is properly installed:

./bin/spark-shell --master local[2]

On Windows, use CMD or PowerShell, not git bash

HADOOP_HOME (or the variable hadoop.home.dir property) needs to be set properly.
Known Hadoop for Windows issue: winutils is not included in the Apache distribution

You can fix this problem in two ways

Install a full native windows Hadoop version. The ASF does not currently release such a version; releases are available externally. Or: get the WINUTILS.EXE binary from a Hadoop redistribution. There is a repository of this for some Hadoop versions on github.

Then

Set the environment variable %HADOOP_HOME% to point to the directory above the BIN dir containing WINUTILS.EXE.
Or: run the Java process with the system property hadoop.home.dir set to the home directory.

To run Spark interactively in a Python interpreter, use bin/pyspark:

./bin/pyspark --master local[2]

Or submit Spark jobs:

./bin/spark-submit examples/src/main/python/pi.py 10