Light RAG Experiment Report
Overcoming GraphRAG’s Limitations with LightRAG: A Direct Token Usage Comparison
Microsoft’s graphRAG consumes significantly more tokens than traditional RAG for tasks like Knowledge Graph construction and query processing (e.g., 610,000 tokens). To address these limitations, new architectures have emerged, and the one I tested this time is LightRAG.
What caught my attention was a claim in the LightRAG paper: "While GraphRAG consumes 610,000 tokens, LightRAG can perform searches with fewer than 100 tokens." However, the term "search tokens" felt vague to me, so I decided to implement it myself and compare token usage directly. (Source: LightRAG GitHub)
Initial Indexing: Setting the Baseline
This token consumption reflects the initial indexing process and serves as a key baseline for understanding LightRAG’s core behavior.
Token Usage Summary
Metric | Value |
---|---|
Document Token Count | 2,412 tokens |
Embedding Input Tokens | 4,175 |
Embedding Model Requests | 3 |
ChatCompletion Input Tokens | 21,485 |
ChatCompletion Output Tokens | 3,244 |
ChatCompletion Model Requests | 6 |
Document Details
Metric | Value |
---|---|
Document Size | 11kb |
Query | "What are the top themes in this story?" (15 tokens) |
Extracted Entities | 19 entities (first try), 20 entities (second try) |
Extracted Relationships | 23 relationships (first try), 22 relationships (second try) |
Execution Time | 26.1074 seconds |
Note: A second try was performed with slightly different results.
Query Result Comparison: Query modes
LightRAG offers two query processing modes: Mix Mode (combining vector search and Knowledge Graph) and Local Mode (entity-centric search). Below is the token usage for the same query in each mode:
Query Token Usage by Mode
input query 15token
Check point
GetKeywords Input Default Input Format Consumes about 405 tokens
Real Query Input, system instruct and full context are mostly used as input, except for input query 15token.
For example, in naive, 2870 was used as input context, and in MIx, 9129 was used as input context
Metric | Naive | Local | Global | Hybrid | Mix |
---|---|---|---|---|---|
Embedding Tokens | 12 | 12 | 18 | 120 | 26 |
Embedding Requests | 1 | 1 | 1 | 2 | 3 |
GetKeywords Input | N/A | 🔵427 |
🔵426 |
🔵426 | 🔵471 |
GetKeywords Output | N/A | 42 | 47 | 47 | 45 |
Real Query Input | 🔴2,879 |
🔴4,349 |
🔴5,143 |
🔴6,538 | 🔴9,144 |
Real Query Output | 307 | 288 | 332 | 370 | 460 |
Cache Status | Enabled | Enabled | Enabled | Enabled | Only first step |
Cache Mechanism | vector similarity |
Single entity | Multiple Entities | Local + Global | Vector + Graph integration |
Update Token Usage Comparison (Including Initial Indexing)
Caution: Some data may lack precision due to unaccounted execution order; use it only as a trend reference.
I was particularly interested in how LightRAG handles updates to existing data, as it claims to support incremental updates. However, my tests revealed that adding new documents and updating existing ones behave differently. Here are the results:
Document A: Sequential Update Analysis
Process 1 Input : Instruction + full text used
Process 1 Output: get entity and relationship information of chunk
Process 2 Input: Previous conversation + how to extract and rest data
Process 2 Output: All entity and relationship data
This is repeated until enitity and realationship are extracted from a chunk of all docs.
Metric | Initial Indexing | +17 Tokens Update | +17 +209 Tokens Update |
---|---|---|---|
Embedding Input | 4,175 | 4,826 | 5,242 |
Embedding Requests | 3 | 3 | 3 |
Chat Input | 21,485 | 5,828 | 6,300 |
Chat Output | 3,244 | 750 | 1,072 |
Chat Requests | 6 | 2 | 2 |
Process 1 Total | N/A | 2,890 | 3,153 |
Process 1 Input | N/A | 🔵2,509 | 🔵2,718 |
Process 1 Output | N/A | 381 |
435 |
Process 2 Total | N/A | 3,690 | 4,221 |
Process 2 Input | N/A | 🔴3,319 |
🔴3,582 |
Process 2 Output | N/A | 371 |
639 |
Regardless of how much content is added, the entire document is embedded, and the conversation repeats until entities and relations are extracted from all tokens.
Document B: New Document Addition (104 tokens)
Process 1 Input : Instruction + full text used
Process 1 Output: get entity and relationship information of chunk
Process 2 Input: Previous conversation + how to extract and rest data
Process 2 Output: All entity and relationship data
This is repeated until enitity and realationship are extracted from a chunk of all docs.
Metric | Value |
---|---|
Embedding Input | 330 |
Embedding Requests | 2 |
Chat Input | 3,300 |
Chat Output | 570 |
Chat Requests | 1 |
Process 1 Total | 🔵2,871 |
Process 1 Input | 2,384 |
Process 1 Output | 487 |
Process 2 Total | 🔴3,871 |
Process 2 Input | 3,300 |
Process 2 Output | 571 (Entity + relationship) |
When adding a new document, only the tokens from the newly added document are considered, independent of existing documents.
Key Findings from Single Document Update:
- Embedding Input Token Analysis: LightRAG appears to re-embed the entire file during updates, not just the added tokens
- ChatCompletion Input Token Analysis: During incremental updates, input tokens scale efficiently with added tokens and generally, instructions or context are included in the input as a prefix.
- ChatCompletion Output Token Analysis: Output remains stable or decreases regardless of input size, possibly due to simplification or deduplication of existing relationships
Conclusion
LightRAG drastically reduces token usage compared to GraphRAG, enhancing efficiency with incremental updates and caching. However, its high token consumption during initial indexing and full re-embedding during updates remain limitations. If it evolves to support partial embedding or maximizes cache usage, LightRAG could become an even lighter, more optimized model for small-scale updates.