Knowledge Indexing (RAG)
The system employs a Retrieval-Augmented Generation (RAG) pipeline to ground trading agents in domain-specific expertise. This ensures that recommendations from Technical, Fundamental, or Sentiment agents are backed by established research and curated trading strategies rather than relying solely on the LLM's base knowledge.
Knowledge Directory Structure
The KnowledgeIndexer expects a structured knowledge/ directory at the project root. Files placed in these subdirectories are automatically routed to the corresponding agent's vector store:
| Folder | Target Domain | Description |
| :--- | :--- | :--- |
| /knowledge/technical_analysis | technical | Chart patterns, indicator definitions, and signal logic. |
| /knowledge/sentiment | sentiment | Market psychology, Fear & Greed interpretations, and social signals. |
| /knowledge/fundamental | fundamental | On-chain metrics, network health, and valuation models. |
| /knowledge/risk_management | risk | Position sizing rules and stop-loss/take-profit strategies. |
| /knowledge/papers | papers | Academic research and whitepapers relevant to all domains. |
Supported Formats
The indexer is optimized for high-quality, actionable content:
- Markdown (.md): Preferred format. The indexer uses Markdown headers (
#,##) to intelligently split content into semantic sections. - Plain Text (.txt): Supported for simple documentation.
- PDF (.pdf): Supported via
PyPDF2(optional), though the system favors curated Markdown summaries to ensure higher signal-to-noise ratios for trading decisions.
Using the Knowledge Indexer
The KnowledgeIndexer class handles the loading, chunking, and upserting of documents into the vector database.
Initializing and Indexing All Files
To process the entire knowledge/ directory:
from agents.rag.indexer import KnowledgeIndexer
# Initialize with the path to your knowledge directory
indexer = KnowledgeIndexer(knowledge_dir="./knowledge")
# Index all domains at once
stats = indexer.index_all()
for domain, count in stats.items():
print(f"Domain '{domain}': Indexed {count} chunks.")
Indexing a Specific Domain
If you only want to update a specific area (e.g., after adding new technical indicators):
from pathlib import Path
from agents.rag.indexer import KnowledgeIndexer
indexer = KnowledgeIndexer()
folder_path = Path("./knowledge/technical_analysis")
count = indexer.index_domain(domain="technical", folder_path=folder_path)
print(f"Indexed {count} technical analysis chunks.")
Processing Logic
Chunking Strategy
To maintain context, the indexer employs a hierarchical splitting strategy:
- Header Splitting: It first splits documents by Markdown headers (e.g.,
## Strategy Name). - Paragraph Splitting: If a section exceeds the target
chunk_size(default: 500 characters), it further subdivides the text by paragraphs. - Semantic Metadata: Each chunk is tagged with its source filename and domain, allowing agents to cite their sources during the reasoning phase.
Vector Storage
Once processed, chunks are converted into embeddings and stored in domain-specific vector stores. When an agent (e.g., the Technical Agent) receives a query, it queries its specific vector store to retrieve the top-K relevant chunks before formulating a response.
Best Practices for Knowledge Content
For optimal retrieval performance:
- Use Descriptive Headers: Use H2/H3 tags in Markdown to define clear topics.
- Keep Chunks Focused: Write concise summaries of strategies rather than long, rambling prose.
- Citations: Include source names or dates within the text to assist agents in temporal reasoning.