Create and store file embeddings
Introduced in 2025.05

Overview

The ai.file.embedBatch procedure reads text from a local or remote file and generates vector embeddings for its content, optionally splitting it in chunks. It returns one row for each chunk, containing its index, the chunk content, and its embedding vector.

CALL ai.file.embedBatch(
  'https://example.com/large-document.txt', (1)
  'OpenAI', (2)
  { token: $openaiToken, model: 'text-embedding-3-small' }, (3)
  1000 (4)
) YIELD index, resource, vector
MERGE (n:Chunk {id: index})
SET n.text = resource, n.embedding = vector

1	The path or URL of the source file. Both local and remote URLs (`http`, `https`, or `ftp`) are supported. Local paths are resolved relative to the Neo4j installation folder.
2	Identifier of the AI provider to use. If the given model is not supported, OpenAI’s `ada` model is used.
3	Provider-specific configuration.
4	The maximum token count for each chunk.

Accessing files with ai.file.embedBatch requires load privileges.

Signature for `ai.file.embedBatch()`
ProcedureIntroduced in 2026.05
Syntax	`ai.file.embedBatch(file, provider, configuration = {}, tokenCountLimit = null) :: (index, resource, vector)`
Description	Embed a given file in batches of resources as vectors using the named provider.
Inputs	Name	Type	Description
	`file`	`STRING`	The path or URL of the file to read.
	`provider`	`STRING`	Identifier of the AI provider to use. See Embeddings → Providers for supported options.
	`configuration`	`MAP`	Provider-specific configuration. Use `CALL ai.text.embed.providers()` to find the configuration needed for each provider.
	`tokenCountLimit`	`INTEGER`	The maximum token count limit for each chunk. If `null` (default), chunking is not applied.
Returns	Name	Type	Description
	`index`	`INTEGER`	The index of the corresponding chunk within the list of resources.
	`resource`	`STRING`	The original resource element (the chunk text).
	`vector`	`VECTOR`	The generated vector embedding for the resource.

Chunking behavior

If tokenCountLimit is null, the entire file content is embedded as a single resource.
If tokenCountLimit is provided, the file content is chunked into a list of resources, each within the token count limit.

Examples

Import local files
Not available on Aura

You can store files on the database server and access them by using the file:/// schema. By default, paths are resolved relative to the Neo4j import directory.

Example 1. Embed text from a local file

document.txt

Neo4j is a graph database management system.
It is designed to store and process large-scale graphs.
Graph databases are well-suited for highly connected data.

Query

CALL ai.file.embedBatch(
  'file:///document.txt',
  'OpenAI',
  { token: $openaiToken, model: 'text-embedding-3-small' }
) YIELD index, resource, vector
MERGE (c:Chunk {file: 'document.txt', index: index})
SET c.text = resource, c.embedding = vector
RETURN index, resource, vector

Result
index	resource	vector
`0`	`'Neo4j is a graph database management system.'`	`[0.0052, -0.0393, …]`
`1`	`'It is designed to store and process large-scale graphs.'`	`[0.0123, -0.0045, …]`
`2`	`'Graph databases are well-suited for highly connected data.'`	`[0.0089, 0.0212, …]`
3 rows Added 3 nodes, Set 6 properties, Added 3 labels

Configuration settings for file URLs

dbms.security.allow_csv_import_from_file_urls: Whether file:/// URLs are allowed.
server.directories.import: The path relative to which file:/// URLs are parsed.

Import from a remote location

ai.file.embedBatch can embed text from a file hosted on a remote path. It supports accessing files via HTTPS, HTTP, and FTP (with or without credentials). It also follows redirects, except those changing the protocol (for security reasons).

Example 2. Embed text from a remote file via HTTPS

https://example.com/document.txt

Neo4j GenAI plugin can read this file.
You can generate embeddings from it.

Query

CALL ai.file.embedBatch(
  'https://example.com/document.txt',
  'OpenAI',
  { token: $openaiToken, model: 'text-embedding-3-small' }
) YIELD index, resource, vector
RETURN index, resource, vector

Result
index	resource	vector
`0`	`'Neo4j GenAI plugin can read this file.\nYou can generate embeddings from it.'`	`[0.0052, -0.0393, …]`
1 row

Example 3. Embed text from a remote file via FTP using credentials

ftp://<username>:<password>@<domain>/documents/file.txt

This is a file hosted on an FTP server.

Query

CALL ai.file.embedBatch(
  'ftp://<username>:<password>@<domain>/documents/file.txt',
  'OpenAI',
  { token: $openaiToken, model: 'text-embedding-3-small' }
) YIELD index, resource, vector
RETURN index, resource, vector

Result
index	resource	vector
`0`	`'This is a file hosted on an FTP server.'`	`[0.0052, -0.0393, …]`
1 row

Create and store file embeddingsIntroduced in 2025.05

Overview

Examples

Import local filesNot available on Aura

Import from a remote location

Create and store file embeddings
Introduced in 2025.05

Import local files
Not available on Aura