install.spark {SparkR} | R Documentation |
install.spark
downloads and installs Spark to a local directory ifit is not found. If SPARK_HOME is set in the environment, and that directory is found, that isreturned. The Spark version we use is the same as the SparkR version. Users can specify a desiredHadoop version, the remote mirror site, and the directory where the package is installed locally.hadoopVersion | Version of Hadoop to install. Default is '2.7' . It can take otherversion number in the format of 'x.y' where x and y are integer.If hadoopVersion = 'without' , 'Hadoop free' build is installed.See'Hadoop Free' Build for more information.Other patched version names can also be used, e.g. 'cdh4' |
mirrorUrl | base URL of the repositories to use. The directory layout should followApache mirrors. |
localDir | a local directory where Spark is installed. The directory containsversion-specific folders of Spark packages. Default is path tothe cache directory:
|
overwrite | If TRUE , download and overwrite the existing tar file in localDirand force re-install Spark (in case the local directory or file is corrupted) |
mirrorUrl
and hadoopVersion
.mirrorUrl
specifies the remote path to a Spark folder. It is followed by a subfoldernamed after the Spark version (that corresponds to SparkR), and then the tar filename.The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop version].tgz.For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 fromhttp://apache.osuosl.org
has path:http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
.For hadoopVersion = 'without'
, [Hadoop version] in the filename is thenwithout-hadoop
.paste
this inside the terminal
/usr/local/Cellar/apache-spark/1.3.0
/usr/local/Cellar/apache-spark/1.3.0/libexec/conf
log4j.properties
replace INFO
with WARN
profile
for spark
reference configuring ipython notebook support for pyspark00-pyspark-setup.py
in ~/.ipython/profile_spark/startup/