Chunking in LLMs

Generated using DALLE and edited using Wix AI Editor

This is a working draft

If you're reading this, you already know what LLMs are. You’ve likely worked with them, noticed chunking in a pipeline, looked it up, and landed here. So, we’ll skip the usual introductions and dive straight into chunking.

What Problem Does Chunking Solve?

One of the limitations when working with LLMs is that the maximum number of tokens it can process in a single input - also known as context window size , is limited. For example, GPT-4 has a context window size of 8192 tokens, but GPT-4o's context window size is significantly larger at 128k tokens.

Context window size matters because it determines how much information a model can retain and process at once. The larger the window, the more context the model has access to, leading to better responses and improved performance on long-form tasks.

If context window size is limited, how do we work with applications that need longer context sizes? We employ a simple and effective strategy called Chunking.

What is Chunking?

Chunking is the process of breaking large bodies of text into smaller, more manageable pieces called chunks. These chunks serve as fundamental units of information that an LLM can process more efficiently. Instead of feeding the model an overwhelming amount of text, chunking ensures that each piece remains digestible, relevant, and easy to retrieve when needed.

Why is Chunking Useful?

Chunking isn’t just a workaround for token limits—it actually improves how LLMs process, retrieve, and generate information. Here’s why it matters:

Works Around Context Window Limits – If your input is too long for an LLM to handle at once, chunking ensures the text is broken down into pieces that fit within the model’s constraints.
More Accurate Responses – In Retrieval-Augmented Generation (RAG) systems, semantically meaningful chunks make it easier for retrieval models to pinpoint the most relevant information. Instead of sifting through an entire document, the model focuses on smaller, more precise sections—leading to sharper, more context-aware responses.
Faster Processing & Retrieval – Smaller, well-structured chunks allow both LLMs and search systems to process and retrieve information faster. Instead of dealing with bloated, unwieldy inputs, models work with concise, optimized text segments, reducing processing time and improving efficiency.

Chunking Strategies

Why do you need strategies for Chunking?

Chunking is only effective if done correctly. Poor chunking can reduce retrieval accuracy and degrade application performance.

For example, if you chunk text using a fixed size without considering context, you might end up splitting sentences or breaking apart related information. This leads to fragmented understanding, making it harder for retrieval systems to return meaningful results. Instead of getting complete, relevant answers, you’ll end up with incomplete or disjointed chunks—which defeats the purpose.

For maximizing the performance,efficiency, and reliability of LLMs, particularly in applications involving large datasets and complex information retrieval tasks, effective chunking strategies are needed.

Common Strategies for Chunking

I share below some of the many common strategies for chunking.

1. Fixed-Size Chunking

Fixed-size chunking, the one we spoke about in our example, involves partitioning text into evenly-sized chunks of a predetermined length, which is typically defined by a fixed number of characters, words, or, most commonly in the context of LLMs, tokens .

Advantages:

Easy to implement : Requires no complex linguistic analysis.
Uniform chunk sizes : Simplifies storage and retrieval.

Disadvantages:

Can split sentences or semantic units : This can lead to loss of meaning and make retrieval less effective.
Ignores text structure : Since it doesn’t adapt to natural breaks (like sentences or paragraphs), it can disrupt coherence.

One way to partially counteract the loss of semantic coherence is by using chunk overlap. This means that a certain number of tokens (or characters) from the end of one chunk are repeated at the beginning of the next. While this doesn’t fully solve the issue, it helps maintain context across chunks.

Since finding the optimal chunk size depends on balancing context preservation with LLM constraints, some experimentation is usually required to get it right.

2 Recursive Chunking

Recursive chunking systematically breaks down text in a hierarchical manner, continuously splitting it into smaller chunks until each piece reaches an optimal size for processing. This method is particularly useful for structured documents, where a logical chunking strategy could involve splitting the text first by headings, then by subsections, then paragraphs, and so on.

Advantages:

Structural Integrity: Maintains the natural organization of the document, ensuring a logical flow of information.
Scalability: Ideal for processing large, complex documents, as it breaks them down into manageable units without losing context.

Disadvantages:

Complex Implementation: Requires an understanding of the document's structural markers, which can vary across different texts.
Potential Overhead: Multiple recursive operations may increase computational time and resource usage.

3 Agentic Chunking

As the name suggests, agentic chunking, uses LLMs to perform the chunking. Rather than splitting the text based on a fixed parameter, LLMs are prompted to analyse the content to find semantically meaningful boundaries like topic transitions.

Advantages:

Enhanced Semantic Coherence: By identifying and preserving natural topic boundaries as identified by the LLM, agentic chunking ensures that each segment maintains its contextual integrity, leading to more coherent and relevant outputs.
Real-Time Adaptability: Agentic chunking can dynamically respond to changing user needs, adjusting the segmentation based on real-time interactions and feedback.

Disadvantages:

Resource Intensive: The process may demand significant computational resources, potentially affecting efficiency compared to simpler chunking methods.
Additional Training and Tuning: Effective implementation may require further training and fine-tuning of the LLM to accurately identify semantic boundaries, adding to the development effort.

Factors Influencing the Choice of a Chunking Strategy

Choosing a chunking strategy depends on several factors, including:

Text Structure – Long documents (books, reports) may need hierarchical chunking, while shorter content (tweets, posts) works with simpler methods.
Language Model Constraints – Consider the model’s context window and optimal chunk sizes for best performance.
Query Complexity – Simple queries work well with smaller chunks, while complex queries may need larger ones for context retention.
RAG Considerations – If using retrieval-augmented generation, how are the retrieved components used?
Computational Resources – Semantic chunking and other advanced methods demand significant computational resources, which may impact storage and query latency.
Domain Specificity – Legal, medical, or technical texts may need custom chunking to preserve structure and meaning.

This blog is a work in progress. Given the unpredictability of grad school, I’d rather publish my latest draft than wait for it to feel “perfect.” Too many drafts have been stalled by perfectionism, so I’m choosing progress over polish. Expect ongoing updates, refinements, and improvements over time. My next progress update would be to include more chunking strategies, perhaps add coding examples, and a summary of advice on enterprise-level considerations in chunking strategy. Would be nice if we had a neat table at the end as well.

References

OpenAI. (2025). Models. OpenAI. https://platform.openai.com/docs/models [Accessed on 03-19-2025]
Google Cloud. (2024, December 19). The prompt: What are long context windows and why do they matter. Google Cloud. https://cloud.google.com/transform/the-prompt-what-are-long-context-windows-and-why-do-they-matter [Accessed on 03-19-2025]
Wijaya, C. Y. (2025, March 10). 9 chunking strategies to improve RAG performance. Non-Brand Data. https://www.nb-data.com/p/9-chunking-strategis-to-improve-rag [Accessed on 03-19-2025]
Gutowska, A. (2025, January 20). Implement RAG chunking strategies with LangChain and watsonx.ai. IBM. https://www.ibm.com/think/tutorials/chunking-strategies-for-rag-with-langchain-watsonx-ai [Accessed on 03-19-2025]
MyScale. (2024, November 11). Chunking strategies for optimizing large language models (LLMs). MyScale. https://myscale.com/blog/chunking-strategies-for-optimizing-llms [Accessed on 03-19-2025]
Richards, D. (2024, June 4). The definitive guide to document chunking for AI applications. Rag About It. https://ragaboutit.com/the-definitive-guide-to-document-chunking-for-ai-applications/ [Accessed on 03-19-2025]

Disclosure of AI Use

This blog was written with the assistance of LLM tools. Specifically, during the content planning phase, I used Gemini Deep Research and OpenAI GPT-4o with search to find relevant sources to learn the topic from. I view them akin to Google Search, but smarter and more efficient.

While AI tools such as ChatGPT and Grammarly helped refine language, enhance clarity, and improve readability, the core structure, presentation, and final decisions are entirely mine. All content written was after I learned the topic by visiting original sources.

Being Enfa

Chunking in LLMs

What Problem Does Chunking Solve?

What is Chunking?

Why is Chunking Useful?

Chunking Strategies

Why do you need strategies for Chunking?

Common Strategies for Chunking

1. Fixed-Size Chunking

2 Recursive Chunking

3 Agentic Chunking

Factors Influencing the Choice of a Chunking Strategy

References

Disclosure of AI Use

Recent Posts

Comments