Historical Data Collection
The system relies on a multi-source data collection strategy to provide agents with a comprehensive view of historical market conditions. This includes real-time and historical news feeds, as well as a vectorized knowledge base of research papers and technical documentation used for Retrieval-Augmented Generation (RAG).
News Data Connectors
The NewsAPIConnector is the primary utility for gathering market sentiment and event data. It is designed with a fallback mechanism to ensure data availability even without paid API keys.
NewsAPIConnector Interface
The connector interacts with NewsAPI.org and various RSS feeds (Google News, CoinTelegraph, CoinDesk) to compile news datasets.
from data_connectors.newsapi_connector import NewsAPIConnector
# Initialize connector (Optional API key for NewsAPI)
connector = NewsAPIConnector(api_key="your_newsapi_key")
# Fetch recent Bitcoin/Crypto news
articles = connector.get_bitcoin_news(limit=20)
Key Features:
- Source Fallback: Automatically tries CoinTelegraph and CoinDesk RSS feeds if NewsAPI is unavailable or the API key is missing.
- Data Cleaning: Automatically strips HTML tags from news descriptions and truncates text to ensure compatibility with LLM context windows.
- Structure: Returns a list of dictionaries containing
title,description,url,published_at, andsource.
Knowledge Indexing (RAG)
To provide the agents with domain-specific expertise, the system includes a KnowledgeIndexer. This utility processes static documents (Markdown, Text, PDFs) and prepares them for the vector store.
Directory Structure
The indexer expects the knowledge/ directory to be organized by agent domains:
knowledge/technical_analysis/: Chart patterns, indicator math.knowledge/sentiment/: Market psychology, social media impact papers.knowledge/fundamental/: On-chain metrics, network health docs.knowledge/risk_management/: Position sizing and volatility theory.knowledge/papers/: General research PDFs.
Usage
The KnowledgeIndexer chunks large documents and assigns metadata based on the folder structure to ensure that a Technical Agent only retrieves technical documents.
from agents.rag.indexer import KnowledgeIndexer
indexer = KnowledgeIndexer(knowledge_dir="./knowledge")
# Index all documents into their respective vector stores
results = indexer.index_all()
print(f"Indexed chunks: {results}")
Supported Formats:
- Markdown (.md): Preferred format; split by headers.
- Text (.txt): Split by paragraph.
- PDF (.pdf): Supported via
PyPDF2(Note: Curated Markdown summaries are recommended over raw PDFs for higher reasoning quality).
Historical Trade Logs
During backtesting, the system generates detailed JSON reports stored in the logs/ directory. These logs serve as historical data for the Dashboard and for performance auditing.
Viewing Historical Trades
You can inspect the results of a collection run or backtest using the view_trades.py utility:
# View the most recent backtest log
python view_trades.py
# View a specific report
python view_trades.py logs/backtest_report_20231027.json
Data Schemas
Historical data is standardized using Pydantic models to ensure consistency across the orchestrator and the dashboard.
| Model | Purpose |
| :--- | :--- |
| AgentRecommendation | Base schema for all historical agent decisions. |
| TechnicalAnalysis | Includes metadata for RSI, MACD, and Support/Resistance levels. |
| SentimentAnalysis | Includes Fear & Greed Index interpretations. |
| RiskMetadata | Stores calculated ATR stop-losses and position sizes. |
These schemas ensure that when data is "collected" during a backtest, it is stored with the full context of the agent's reasoning, not just the final trade action.