RAG & Knowledge Management

Overview

The system implements a Retrieval-Augmented Generation (RAG) architecture to ground trading agent decisions in high-quality market theory, research papers, and domain-specific knowledge. Unlike standard LLM interactions, this system retrieves relevant context from a curated local knowledge base before generating trading recommendations.

Knowledge is partitioned into specific domains to ensure that agents (e.g., the Technical Agent or Risk Agent) only query information relevant to their specific role.

Knowledge Architecture

Knowledge is organized within the /knowledge directory, mapped to specific analysis domains:

Curated vs. Raw Data

The system prioritizes curated Markdown (.md) summaries over raw PDFs. While the KnowledgeIndexer includes structural support for various formats, PDF indexing is intentionally bypassed for research papers to ensure high-quality, actionable content. This prevents the "noise" often found in academic PDFs (like methodology tables or citations) from diluting the trading context.

Indexing Knowledge

The KnowledgeIndexer class is the primary interface for processing files into the vector store. It handles text chunking, metadata tagging, and domain partitioning.

Basic Usage

To index the entire knowledge base:

from agents.rag.indexer import KnowledgeIndexer

# Initialize indexer (defaults to the project's knowledge/ directory)
indexer = KnowledgeIndexer()

# Process and store all documents
results = indexer.index_all()

for domain, count in results.items():
    print(f"Domain: {domain} | Chunks Indexed: {count}")

Chunking Strategy

Documents are not split arbitrarily. The system employs a semantic-aware chunking strategy:

Header Splitting: Content is first split by Markdown headers (#, ##) to keep related concepts together.
Paragraph Refinement: If a section exceeds the chunk_size (default 500 characters), it is further split by paragraphs.
Metadata Tagging: Each chunk is stored with its source filename and domain to allow for precise citations in agent reasoning.

Data Connectors & Real-Time Context

In addition to static knowledge, the system integrates real-time news as dynamic context for the Sentiment Agent.

NewsAPI Connector

The NewsAPIConnector provides a bridge to external market news. It implements a fallback mechanism to ensure the system remains functional even without API keys:

RSS Feeds: Fetches latest news from CoinTelegraph and CoinDesk.
Google News: Uses Google News RSS for general crypto-market coverage.
NewsAPI: (Optional) If an API_KEY is provided, fetches high-resolution headline data.

from data_connectors.newsapi_connector import NewsAPIConnector

connector = NewsAPIConnector()
news = connector.get_bitcoin_news(limit=5)

for article in news:
    print(f"Source: {article['source']} - Title: {article['title']}")

Knowledge Retrieval in Agents

When the TradingOrchestrator triggers an analysis, each agent utilizes its domain-specific vector store.

Retrieval Interface

Agents use the retrieved context to populate the reasoning field of their AgentRecommendation schema. This ensures that every BUY/SELL/HOLD decision is backed by:

Static Knowledge: Theory from the vector store (e.g., "According to the Wyckoff theory in our database...").
Dynamic Context: Real-time news and market data.

Schema Enforcement

The structured output (defined in agents/schemas.py) requires agents to list key_factors. These factors are often derived directly from the RAG retrieval process, ensuring transparency in the decision-making pipeline.

Configuration

To point the indexer to a custom directory or modify indexing behavior:

indexer = KnowledgeIndexer(knowledge_dir="/path/to/custom/knowledge")

# Index only a specific domain
indexer.index_domain(domain="technical", folder_path=Path("./custom_tech_data"))

Note: Ensure chromadb or your chosen vector database is initialized before running the indexer. The storage location is managed via agents/rag/vector_store.py.