Create and store file embeddingsIntroduced in 2025.05
Overview
The ai.file.embedBatch procedure reads text from a local or remote file and generates vector embeddings for its content, optionally splitting it in chunks. It returns one row for each chunk, containing its index, the chunk content, and its embedding vector.
CALL ai.file.embedBatch(
'https://example.com/large-document.txt', (1)
'OpenAI', (2)
{ token: $openaiToken, model: 'text-embedding-3-small' }, (3)
1000 (4)
) YIELD index, resource, vector
MERGE (n:Chunk {id: index})
SET n.text = resource, n.embedding = vector
| 1 | The path or URL of the source file.
Both local and remote URLs (http, https, or ftp) are supported.
Local paths are resolved relative to the Neo4j installation folder. |
| 2 | Identifier of the AI provider to use.
If the given model is not supported, OpenAI’s ada model is used. |
| 3 | Provider-specific configuration. |
| 4 | The maximum token count for each chunk. |
|
Accessing files with |
Syntax |
|
||
Description |
Embed a given file in batches of resources as vectors using the named provider. |
||
Inputs |
Name |
Type |
Description |
|
|
The path or URL of the file to read. |
|
|
|
Identifier of the AI provider to use. See Embeddings → Providers for supported options. |
|
|
|
Provider-specific configuration. Use |
|
|
|
The maximum token count limit for each chunk. If |
|
Returns |
Name |
Type |
Description |
|
|
The index of the corresponding chunk within the list of resources. |
|
|
|
The original resource element (the chunk text). |
|
|
|
The generated vector embedding for the resource. |
|
|
Chunking behavior
|
Examples
Import local filesNot available on Aura
You can store files on the database server and access them by using the file:/// schema.
By default, paths are resolved relative to the Neo4j import directory.
Neo4j is a graph database management system.
It is designed to store and process large-scale graphs.
Graph databases are well-suited for highly connected data.
CALL ai.file.embedBatch(
'file:///document.txt',
'OpenAI',
{ token: $openaiToken, model: 'text-embedding-3-small' }
) YIELD index, resource, vector
MERGE (c:Chunk {file: 'document.txt', index: index})
SET c.text = resource, c.embedding = vector
RETURN index, resource, vector
| index | resource | vector |
|---|---|---|
|
|
|
|
|
|
|
|
|
3 rows Added 3 nodes, Set 6 properties, Added 3 labels |
||
|
Configuration settings for file URLs
|
Import from a remote location
ai.file.embedBatch can embed text from a file hosted on a remote path.
It supports accessing files via HTTPS, HTTP, and FTP (with or without credentials).
It also follows redirects, except those changing the protocol (for security reasons).
Neo4j GenAI plugin can read this file.
You can generate embeddings from it.
CALL ai.file.embedBatch(
'https://example.com/document.txt',
'OpenAI',
{ token: $openaiToken, model: 'text-embedding-3-small' }
) YIELD index, resource, vector
RETURN index, resource, vector
| index | resource | vector |
|---|---|---|
|
|
|
1 row |
||
This is a file hosted on an FTP server.
CALL ai.file.embedBatch(
'ftp://<username>:<password>@<domain>/documents/file.txt',
'OpenAI',
{ token: $openaiToken, model: 'text-embedding-3-small' }
) YIELD index, resource, vector
RETURN index, resource, vector
| index | resource | vector |
|---|---|---|
|
|
|
1 row |
||