Installation
Choose the correct version
Make sure to match both the Spark version and the Scala version of your setup. See Compatibility to determine what connector version is suitable for your setup.
The examples below show the installation process using the latest 5.4 connector.
Usage with the Spark shell
|
Starting from version 5.4.0, the connector is not published to Spark Packages anymore. |
The connector is available via Maven Central:
$SPARK_HOME/bin/spark-shell --packages org.neo4j:neo4j-connector-apache-spark_2.12:5.4.3_for_spark_3
$SPARK_HOME/bin/pyspark --packages org.neo4j:neo4j-connector-apache-spark_2.12:5.4.3_for_spark_3
Alternatively, you can download the connector JAR file from the Neo4j Connector Page or from the GitHub releases page and run the following command to launch a Spark interactive shell with the connector included:
$SPARK_HOME/bin/spark-shell --jars neo4j-spark-connector-5.4.3-s_2.12.jar
$SPARK_HOME/bin/pyspark --jars neo4j-spark-connector-5.4.3-s_2.12.jar
Self-contained applications
For non-Python applications:
-
Include the connector in your application using the application’s build tool.
-
Package the application.
-
Use
spark-submitto run the application.
For Python applications, run spark-submit directly.
As for the spark-shell, you can run spark-submit via Spark Packages or with a local JAR file.
See the Quickstart for code examples.
build.sbtname := "Spark App"
version := "1.0"
scalaVersion := "2.12.20"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.5.8"
libraryDependencies += "org.neo4j" %% "neo4j-connector-apache-spark" % "5.4.3_for_spark_3"
If you use the sbt-spark-package plugin, add the following to your build.sbt instead:
scala spDependencies += "org.neo4j/neo4j-connector-apache-spark_2.12:5.4.3_for_spark_3"
pom.xml<project>
<groupId>org.neo4j</groupId>
<artifactId>spark-app</artifactId>
<modelVersion>4.0.0</modelVersion>
<name>Spark App</name>
<packaging>jar</packaging>
<version>1.0</version>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.5.8</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.neo4j</groupId>
<artifactId>neo4j-connector-apache-spark_2.12</artifactId>
<version>5.4.3_for_spark_3</version>
</dependency>
</dependencies>
</project>