RAG & Knowledge Management
Overview
The system implements a Retrieval-Augmented Generation (RAG) architecture to ground trading agent decisions in high-quality market theory, research papers, and domain-specific knowledge. Unlike standard LLM interactions, this system retrieves relevant context from a curated local knowledge base before generating trading recommendations.
Knowledge is partitioned into specific domains to ensure that agents (e.g., the Technical Agent or Risk Agent) only query information relevant to their specific role.
Knowledge Architecture
Knowledge is organized within the /knowledge directory, mapped to specific analysis domains:
| Domain | Folder | Content Type |
| :--- | :--- | :--- |
| Technical | technical_analysis | Indicators, chart patterns, and technical strategies. |
| Sentiment | sentiment | Market psychology, Fear & Greed interpretations, and social signals. |
| Fundamental | fundamental | On-chain metrics, valuation models, and network health. |
| Risk | risk_management | Position sizing, stop-loss theories, and volatility management. |
| Papers | papers | Curated summaries of academic research and institutional reports. |
Curated vs. Raw Data
The system prioritizes curated Markdown (.md) summaries over raw PDFs. While the KnowledgeIndexer includes structural support for various formats, PDF indexing is intentionally bypassed for research papers to ensure high-quality, actionable content. This prevents the "noise" often found in academic PDFs (like methodology tables or citations) from diluting the trading context.
Indexing Knowledge
The KnowledgeIndexer class is the primary interface for processing files into the vector store. It handles text chunking, metadata tagging, and domain partitioning.
Basic Usage
To index the entire knowledge base:
from agents.rag.indexer import KnowledgeIndexer
# Initialize indexer (defaults to the project's knowledge/ directory)
indexer = KnowledgeIndexer()
# Process and store all documents
results = indexer.index_all()
for domain, count in results.items():
print(f"Domain: {domain} | Chunks Indexed: {count}")
Chunking Strategy
Documents are not split arbitrarily. The system employs a semantic-aware chunking strategy:
- Header Splitting: Content is first split by Markdown headers (
#,##) to keep related concepts together. - Paragraph Refinement: If a section exceeds the
chunk_size(default 500 characters), it is further split by paragraphs. - Metadata Tagging: Each chunk is stored with its source filename and domain to allow for precise citations in agent reasoning.
Data Connectors & Real-Time Context
In addition to static knowledge, the system integrates real-time news as dynamic context for the Sentiment Agent.
NewsAPI Connector
The NewsAPIConnector provides a bridge to external market news. It implements a fallback mechanism to ensure the system remains functional even without API keys:
- RSS Feeds: Fetches latest news from CoinTelegraph and CoinDesk.
- Google News: Uses Google News RSS for general crypto-market coverage.
- NewsAPI: (Optional) If an
API_KEYis provided, fetches high-resolution headline data.
from data_connectors.newsapi_connector import NewsAPIConnector
connector = NewsAPIConnector()
news = connector.get_bitcoin_news(limit=5)
for article in news:
print(f"Source: {article['source']} - Title: {article['title']}")
Knowledge Retrieval in Agents
When the TradingOrchestrator triggers an analysis, each agent utilizes its domain-specific vector store.
Retrieval Interface
Agents use the retrieved context to populate the reasoning field of their AgentRecommendation schema. This ensures that every BUY/SELL/HOLD decision is backed by:
- Static Knowledge: Theory from the vector store (e.g., "According to the Wyckoff theory in our database...").
- Dynamic Context: Real-time news and market data.
Schema Enforcement
The structured output (defined in agents/schemas.py) requires agents to list key_factors. These factors are often derived directly from the RAG retrieval process, ensuring transparency in the decision-making pipeline.
Configuration
To point the indexer to a custom directory or modify indexing behavior:
indexer = KnowledgeIndexer(knowledge_dir="/path/to/custom/knowledge")
# Index only a specific domain
indexer.index_domain(domain="technical", folder_path=Path("./custom_tech_data"))
Note: Ensure
chromadbor your chosen vector database is initialized before running the indexer. The storage location is managed viaagents/rag/vector_store.py.