Migration guide
This guide covers the upgrade boundaries of the 5.x line that contain breaking changes. For the complete list of changes per release, see Changelog.
| Upgrade | Breaking change |
|---|---|
Minimum Spark version raised to 3.3. |
|
Minimum Spark version raised to 3.4. |
|
|
|
Default type conversion changed for timestamps, intervals, and byte arrays; |
5.0 → 5.1
What changed: the baseline Apache Spark version was raised from 3.2 to 3.3 to support push-down v2 filters and LIMIT optimizations. Connector builds for Spark 3.2 are no longer compatible with 5.1.0. If you need to stay on Spark 3.3, connector version 5.1.0 is the latest available — see the compatibility table.
How to migrate:
-
Upgrade your Spark cluster to 3.3 or later.
-
Use the 5.1.0 artifact built for Spark 3:
org.neo4j:neo4j-connector-apache-spark_2.12:5.1.0_for_spark_3 -
If you cannot upgrade Spark, stay on the 5.0.x line.
See the 5.1.0 release notes for the full list of changes.
5.1 → 5.2
What changed: the baseline Apache Spark version was raised from 3.3 to 3.4 to support the push-down Top N optimization. Connector builds for Spark 3.3 are no longer compatible with 5.2.0.
How to migrate:
-
Upgrade your Spark cluster to 3.4 or later.
-
Use the 5.2.0 (or later) artifact built for Spark 3:
org.neo4j:neo4j-connector-apache-spark_2.12:5.2.0_for_spark_3 -
If you cannot upgrade Spark to 3.4, stay on 5.1.0.
See the 5.2.0 release notes for the full list of changes.
5.2 → 5.3
What changed: the single schema.optimization.type option was deprecated in 5.3.0 in favor of three granular options that decouple index, key, and constraint creation.
How to migrate: replace schema.optimization.type according to what you were optimizing:
| Before (5.2.x) | After (5.3.0+) |
|---|---|
|
|
|
|
(relationship key constraints) |
|
(type / existence constraints) |
|
The deprecated option keeps working for backward compatibility, but you should migrate to the granular options. See Schema optimization for the full reference and examples.
5.3 → 5.4
5.4.0 contains two breaking changes: the default type conversion and the relationship.properties map semantics.
Type conversion default
What changed: the default type-conversion logic changed. Starting with 5.4.0, timestamps, intervals, and byte arrays are processed differently from previous releases.
How to migrate:
-
If you rely on the pre-5.4.0 behavior, set the
type.conversionoption tolegacyon your read and write operations:df = spark.read \ .format("org.neo4j.spark.DataSource") \ .option("type.conversion", "legacy") \ .option("labels", "Person") \ .load() -
If you want the new behavior (recommended), leave
type.conversionat its default value ofdefaultand review Data type mapping to confirm how your timestamp, interval, and byte-array columns are now mapped.
Strict relationship.properties map
What changed: when relationship.save.strategy is keys and the relationship.properties map is set, only the relationships listed in the map are written as relationship properties.
Before 5.4.0, properties that were not listed in the map were also written, using their original names.
How to migrate:
-
If you want only the mapped properties to be written (the new behavior), no change is required.
-
If you relied on the previous behavior where unmapped fields were written too, add those fields explicitly to the
relationship.propertiesmap, or leave the option unset so that all unmapped fields are written.
See Writer options for the relationship.properties reference.
See the 5.4.0 release notes and Data type mapping for details.