Core Concepts
Chunking
Large text inputs are automatically split into optimal chunks before embedding.
Why chunking matters
- Embedding models have token limits
- Smaller chunks improve retrieval precision
- Long documents retrieve only relevant sections
How it works
- Text is split on semantic boundaries (sentences, paragraphs)
- Chunks are sized for the embedding model (~512 tokens)
- Overlap preserves context across chunk boundaries
- Each chunk is embedded and stored with a reference to the parent memory
You don't configure this
Chunking is fully automatic. Send any text length and Databaset handles the rest.