ToolBox

Context Chunker & Token Counter

Deconstruct large files, transcripts, or code repositories into logically overlapping chunks for RAG or prompt context injection.

Chunk Settings

Metrics Overview

Input Tokens0
Total Chunks0

Under the Hood: Recursive Document Splitting

When building applications with **Retrieval-Augmented Generation (RAG)**, large text sources must be processed into manageable pieces. Simple mathematical character slicing cuts sentences in half, causing loss of contextual meaning.

How Recursive Chunking Works

Our chunking engine operates on a hierarchically organized set of boundaries to preserve document semantic coherence:

  1. Double Newlines (`\n\n`): Tries to split text into distinct paragraphs first.
  2. Single Newline (`\n`): Respects line breaks and structured lists.
  3. Sentences (`. `): Avoids severing phrases mid-thought.
  4. Space (` `): Final fallback to prevent splitting individual words.

The Importance of Context Overlap

By introducing an **Overlap** (e.g., 200 tokens), consecutive chunks share matching end-and-start paragraphs. This prevents critical queries from falling on "boundary splits" and ensures semantic search indexes (Vector Databases) capture the complete context when fetched by AI agents.