Install Spark 2.3 Locally¶
Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.3.0 uses Scala 2.11.
All you need is to have java installed on your system PATH, or the JAVA_HOME environment variable pointing to a Java installation.
Download the Scala binaries for windows -- you will need Scala 11.x (not 10.x or 12.x) for Spark 2.3
Test correct installation of scala:
Set PATH for Scala if needed:
export PATH = $PATH:/usr/local/scala/bin
Test that Spark is properly installed:
./bin/spark-shell --master local
On Windows, use CMD or PowerShell, not git bash
Error: Failure to locate the winutils binary in the hadoop binary path¶
- HADOOP_HOME (or the variable hadoop.home.dir property) needs to be set properly.
- Known Hadoop for Windows issue: winutils is not included in the Apache distribution
You can fix this problem in two ways
- Install a full native windows Hadoop version. The ASF does not currently release such a version; releases are available externally. Or: get the WINUTILS.EXE binary from a Hadoop redistribution. There is a repository of this for some Hadoop versions on github.
- Set the environment variable %HADOOP_HOME% to point to the directory above the BIN dir containing WINUTILS.EXE.
- Or: run the Java process with the system property hadoop.home.dir set to the home directory.
Run Spark on the local machine¶
To run Spark interactively in a Python interpreter, use
./bin/pyspark --master local
Or submit Spark jobs:
./bin/spark-submit examples/src/main/python/pi.py 10