Vector Store & Retrieval
The system utilizes a Retrieval-Augmented Generation (RAG) architecture to provide trading agents with domain-specific knowledge. By indexing curated research papers, technical analysis guides, and risk management frameworks, the agents can ground their recommendations in established financial theory and historical context.
Knowledge Indexing
The KnowledgeIndexer is the primary interface for processing raw text and markdown files into the semantic vector store. It organizes information into specific domains to ensure that specialized agents (e.g., the Technical Agent) only retrieve relevant context.
Supported Domains
Knowledge is categorized into five distinct domains, each corresponding to a specialized trading agent:
| Domain | Folder | Content Type |
| :--- | :--- | :--- |
| technical | technical_analysis | Indicators, chart patterns, and signal processing. |
| sentiment | sentiment | Market psychology, Fear & Greed interpretations, and social signals. |
| fundamental | fundamental | Valuation models, network health metrics, and on-chain data. |
| risk | risk_management | Position sizing, Kelly Criterion, and volatility modeling. |
| papers | papers | Peer-reviewed research and institutional whitepapers. |
The Indexing Workflow
The indexer processes files in three stages:
- Parsing: Loads
.mdand.txtfiles. Note that while PDF support is available viaPyPDF2, the system favors curated markdown summaries to ensure higher-quality, actionable data retrieval. - Chunking: Content is split based on Markdown headers (
##) and paragraphs to maintain semantic coherence. Chunks are targetted at ~500 characters. - Vectorization: Each chunk is converted into an embedding and stored with metadata (source filename and chunk index) for easy citation.
Usage Example
Running a Full Index
To populate the vector store with all available knowledge from the /knowledge directory:
from agents.rag.indexer import KnowledgeIndexer
# Initialize the indexer pointing to your knowledge base
indexer = KnowledgeIndexer(knowledge_dir="./knowledge")
# Index all domains
results = indexer.index_all()
for domain, count in results.items():
print(f"Domain '{domain}': Indexed {count} chunks.")
Strategic Chunking Logic
The indexer uses a header-aware splitting strategy. This ensures that a section describing "RSI Divergence" isn't cut in half, which would lose the semantic meaning during retrieval.
# Internal logic: Splits by headers first, then paragraphs
sections = re.split(r'\n(?=##?\s)', content)
Retrieval Architecture
Agents do not query the entire database at once. Instead, they use domain-isolated retrieval to reduce "noise" in the prompt context.
- Technical Agent: Retrieves from the
technicalandpapersstores. - Risk Agent: Retrieves from the
riskandpapersstores.
Retrieval Integration
The retrieval process is handled within the orchestrator. When an agent receives a query, it invokes the vector store to find the top-$k$ most similar documents to the current market condition.
from agents.rag.vector_store import get_vector_store
# Access the technical analysis vector store
store = get_vector_store("technical")
# Perform a similarity search
docs = store.similarity_search("What are the implications of RSI over 70?", k=3)
Configuration
The vector store behavior is governed by the following environment variables:
VECTOR_STORE_PATH: Directory where the local database (ChromaDB/FAISS) is persisted.EMBEDDING_MODEL: The model used to generate vectors (defaults to OpenAI or Ollama-based embeddings depending on the backend).
Note on PDF Support: If you wish to index raw PDFs, ensure
PyPDF2is installed. However, for best results, it is recommended to convert PDFs to Markdown to maintain structural integrity during chunking.