Core Concepts

Chunking

Large text inputs are automatically split into optimal chunks before embedding.

Why chunking matters

  • Embedding models have token limits
  • Smaller chunks improve retrieval precision
  • Long documents retrieve only relevant sections

How it works

  1. Text is split on semantic boundaries (sentences, paragraphs)
  2. Chunks are sized for the embedding model (~512 tokens)
  3. Overlap preserves context across chunk boundaries
  4. Each chunk is embedded and stored with a reference to the parent memory

You don't configure this

Chunking is fully automatic. Send any text length and Databaset handles the rest.