AlphaLens is a multi-agent AI system that automates professional equity research for retail investors. Given a stock ticker symbol, the system executes a pipeline of six specialized AI agents — each replicating a distinct analyst role — and produces a structured investment research report with citations, risk flags, valuation estimates, and a cross-verification layer, all within 90 seconds and at zero monetary cost using free-tier APIs.
The system implements two of the assignment's core components: Prompt Engineering, through carefully structured, context-aware prompts that drive each agent's reasoning and produce reliable structured JSON output; and Retrieval-Augmented Generation (RAG), through a full pipeline that ingests SEC 10-K and 10-Q filings, chunks and embeds them with Google's Gemini Embedding model, stores vectors in ChromaDB, and retrieves relevant excerpts to ground all narrative claims in primary source material.
The system is built on LangGraph for orchestration, Google Gemini 2.5 Flash for all language model calls, ChromaDB for vector storage, and Streamlit for the frontend. Data is sourced from SEC EDGAR, Alpha Vantage, yfinance, and the Federal Reserve's FRED API — all publicly available with no licensing cost.
Equity research is one of the most information-dense tasks in finance. A professional analyst preparing a research report on a single company will spend 8 or more hours reading annual filings (often 200+ pages), cross-checking financial statements, building valuation models, assessing macroeconomic context, and writing a structured report with citations. This workflow is effectively inaccessible to retail investors.
Existing solutions fall into two categories: (1) static financial data dashboards that show numbers but provide no interpretation, and (2) general-purpose LLM chatbots that can discuss finance but hallucinate figures and lack grounding in primary source documents. Neither solves the core problem.
AlphaLens addresses this gap by combining structured data APIs with a RAG pipeline over actual SEC filings, coordinated through a multi-agent state machine that mirrors the professional research workflow. The result is a system that not only produces a human-readable report but also explicitly measures confidence and flags divergences between management narrative and financial data.
Assignment Alignment
AlphaLens implements Prompt Engineering (systematic prompting strategies, context management, structured output, graceful error handling) and Retrieval-Augmented Generation (knowledge base construction, vector storage and retrieval, document chunking, ranking and filtering). It falls into the Research Synthesis Tool application type.
The system is organized into four layers: a data layer (API clients for external data sources), an agent layer (six specialized LangGraph nodes), an orchestration layer (LangGraph StateGraph managing parallel and sequential execution), and a presentation layer (Streamlit frontend with custom HTML/CSS components and Plotly charts).
All agents share a single AlphaLensState TypedDict that is passed through the graph. The TypedDict uses Annotated fields with custom reducer functions (_merge_metadata and operator.add) on fields written by parallel branches, preventing race conditions when both rag_citation and quant_analysis attempt to update metadata and error_log simultaneously.
class AlphaLensState(TypedDict):
ticker: str
financial_data: dict # Agent 1 output — structured financials
rag_chunks: list[dict] # Agent 2 output — filing excerpts + metadata
quant_results: dict # Agent 3 output — DCF, RSI, MACD, earnings
risk_flags: list[dict] # Agent 4 output — severity-classified flags
verification: dict # Agent 5 output — divergences, confidence
report: dict # Agent 6 output — full structured report
metadata: Annotated[dict, _merge_metadata] # parallel-safe
error_log: Annotated[list, operator.add] # parallel-safe
chat_history: list[dict]
The pipeline has three execution phases:
data_fusion runs first, acquiring all structured financial data and SEC filing URLs. No other agent can start until this completes, as all downstream agents depend on its output.rag_citation and quant_analysis execute in parallel. RAG Citation downloads and processes the SEC filing while Quant Analysis runs the DCF model and technical indicators. LangGraph's add_edge([A, B], C) syntax creates a fan-in at risk_scanner that waits for both branches.risk_scanner → verify → report_synthesis run sequentially, each building on all prior outputs.Each agent was designed to mirror a specific role in a professional equity research team.
Replaces: Junior Analyst
Makes parallel API calls to SEC EDGAR, Alpha Vantage, yfinance, and FRED. Normalizes all data into a unified financial_data dict. Implements graceful fallback: if Alpha Vantage is rate-limited, yfinance provides backup fundamentals with a reduced confidence flag. Never crashes — always returns a sources_status dict documenting what succeeded and what failed.
Replaces: Research Associate
Downloads the 10-K/10-Q HTML from EDGAR, parses it into sections (MD&A, Risk Factors, Financial Statements, Notes), chunks each section into 500-token overlapping windows, embeds them via Gemini Embedding, stores vectors in ChromaDB, and retrieves the top-k most relevant chunks per query dimension. Returns chunks with full citation metadata: section name, estimated page number, and source file.
Replaces: Quantitative Analyst
Runs three quantitative models: (1) a 3-stage DCF with bear/base/bull scenarios and WACC estimation, always reported at MEDIUM confidence to communicate model uncertainty; (2) technical indicators — RSI(14), MACD(12/26/9) — computed on 6 months of daily price data from yfinance; (3) earnings surprise analysis comparing actual EPS to consensus estimate.
Replaces: Risk / Compliance Officer
Implements a two-stage risk detection pipeline: first, a fast keyword/regex pattern match identifies candidate risk phrases (going concern, material weakness, auditor change, revenue recognition change, related-party, covenant violations, customer concentration). Second, Gemini classifies each candidate by severity (HIGH/MEDIUM/LOW) and generates a description with a citation link. The two-stage approach avoids expensive LLM calls on every chunk.
Replaces: Senior Analyst
Cross-references all outputs from Agents 1–4. Extracts quantitative claims from financial data, extracts narrative claims from management's filing text, and prompts Gemini to identify divergences between them — cases where management's forward-looking statements are inconsistent with the actual historical numbers. Outputs a divergences list and per-section confidence_scores (HIGH/MEDIUM/LOW) that drive the overall report confidence rating.
Replaces: Publishing Editor
Generates five report sections iteratively — Executive Summary, Financial Health, Risk Flags, Valuation, Verification Verdict — using one focused Gemini call per section rather than a single large call. This produces more coherent output and keeps each prompt within safe token limits. Each section includes inline citations and a confidence badge. The overall report confidence is computed as the minimum confidence across all sections, adjusted by the verification divergence count.
AlphaLens uses four distinct prompting strategies across its agents, each chosen for the specific requirements of that agent's task:
| Strategy | Agent(s) | Purpose |
|---|---|---|
| Structured JSON Output | Verification, Risk Scanner | Prompts specify the exact JSON schema the model must return, including field names and allowed enum values (HIGH/MEDIUM/LOW). This enables programmatic parsing of LLM output without additional extraction logic. |
| Iterative Section Prompting | Report Synthesis | Each of the five report sections receives its own focused prompt. Separating concerns produces more coherent output and prevents the model from conflating content across sections, as would occur with a single large prompt. |
| Grounded Evidence Prompting | Verification, RAG Citation | The prompt explicitly provides both the quantitative data and the relevant filing excerpts, then instructs the model to cross-reference them. The model is prohibited from making claims not supported by the provided context, reducing hallucination. |
| Two-Stage Classification | Risk Scanner | Pattern matching identifies candidate risks first (fast, deterministic). Only confirmed candidates are passed to the LLM for severity classification. This reduces API calls by 60–80% compared to passing every chunk to the LLM. |
Each agent's prompt is assembled programmatically from modular sub-sections rather than using static template strings. The Verification Agent's _build_prompt() function, for example, composes four independently rendered blocks — a financial data summary, numbered filing excerpts, quant results, and risk flag counts — and measures the total character length before sending. If the assembled prompt exceeds a safe threshold, filing excerpts are truncated to stay within the model's output token budget.
The follow-up chat feature uses a conversation history injection pattern: the full pipeline state (financial metrics, report sections, risk flags, RAG chunks, and verification divergences) is injected as the first user turn, followed by a model acknowledgment turn, and then the actual conversation history. This gives the model the full research context on every turn without re-running the pipeline.
LLM outputs are inherently non-deterministic. AlphaLens implements a multi-layer recovery strategy for every structured output call:
```json markdown blocks. The parser strips these before attempting to parse.The RAG pipeline transforms raw SEC filing HTML into a searchable vector knowledge base, retrieves the most relevant excerpts for each analysis dimension, and injects those excerpts as grounding context into downstream LLM prompts. This is what distinguishes AlphaLens from a system that merely summarizes financial numbers — the narrative analysis is grounded in primary source documents.
The EdgarClient queries the SEC EDGAR full-text search API to locate 10-K and 10-Q filings for a given ticker. It retrieves the filing index, extracts the primary document URL, and downloads the HTML. The EDGAR API enforces a rate limit of 10 requests per second; the system uses a token bucket rate limiter (implemented in src/utils/rate_limiter.py) to stay compliant.
The FilingParser (in src/data/filing_parser.py) uses BeautifulSoup to parse the downloaded HTML and extract named sections. It identifies section boundaries using SEC standard item markers (Item 1A. Risk Factors, Item 7. Management's Discussion and Analysis, Item 8. Financial Statements, etc.) and stores each section with its estimated page number and character offset. This metadata is preserved through the entire pipeline for citation generation.
Text is split using LangChain's RecursiveCharacterTextSplitter, configured with token-accurate counting via the tiktoken library using the cl100k_base tokenizer. This is important because the embedding model has a fixed input token limit — character-based splitting would produce chunks of inconsistent actual token length.
| Parameter | Value | Rationale |
|---|---|---|
| Chunk Size | 500 tokens | Balances specificity (small enough for precise retrieval) with context (large enough to contain complete sentences and their surrounding context) |
| Chunk Overlap | 100 tokens | Prevents key information that spans a chunk boundary from being lost; 20% overlap is standard for financial text |
| Splitter Hierarchy | Paragraph → sentence → word | RecursiveCharacterTextSplitter tries to break at natural boundaries before resorting to mid-sentence breaks |
Each chunk is stored with a metadata dict containing: ticker, filing_type (10-K or 10-Q), filing_date, section_name, chunk_index (within its section), estimated_page, and source_file. This metadata enables citation generation — when the report references a claim, it can link directly to the section and approximate page number in the original filing.
Chunks are embedded using Google's gemini-embedding-001 model, which produces 3072-dimensional dense vectors. Two distinct task types are used to optimize retrieval quality:
RETRIEVAL_DOCUMENT — used when embedding filing chunks for storage. Optimizes the representation for being retrieved.RETRIEVAL_QUERY — used when embedding the query string at retrieval time. Optimizes for matching against stored document vectors.
Vectors are stored in ChromaDB using a PersistentClient backed by a local .chroma/ directory. Each collection is namespaced by ticker symbol and filing date, so re-running the same ticker reuses cached embeddings rather than re-computing them (a significant cost and latency saving). The similarity metric is cosine distance, which is standard for dense text embeddings.
Batching is applied during embedding: chunks are grouped into batches of 20 (the API's limit) with exponential backoff on rate-limit errors. A per-call rate limiter ensures the system stays within the free-tier limits.
At retrieval time, the FilingRetriever executes multiple focused queries rather than a single broad query. The query set covers distinct analysis dimensions:
Each query retrieves top-k = 5 chunks. Results are de-duplicated by chunk ID and re-ranked by cosine similarity score. The final retrieval set (up to 25 unique chunks) is passed to the Risk Scanner and Verification agents as grounding context.
Retrieved chunks are also stored in AlphaLensState["rag_chunks"] and injected into the follow-up chat context, enabling the chat widget to answer qualitative questions (e.g., "what drove revenue growth?") from the actual filing text rather than the LLM's training data.
| Source | Data Provided | Rate Limit | Auth | Fallback |
|---|---|---|---|---|
| SEC EDGAR | 10-K / 10-Q filing HTML, accession numbers, filing dates | 10 req/sec | None required | Skip RAG; report as "Filing unavailable" |
| Alpha Vantage | Income statement, balance sheet, cash flow (annual + quarterly) | 25 req/day | Free API key | yfinance fundamentals at reduced confidence |
| yfinance | Live price, 52-week high/low, P/E, beta, historical OHLCV | ~2 req/sec | None required | Partial data with sources_status flag |
| FRED (Federal Reserve) | Fed Funds Rate, GDP, CPI, Unemployment Rate | 120 req/min | Free API key | Omit macro section |
| Google Gemini 2.5 Flash | LLM inference, embeddings | 15 RPM (free tier) | API key | Exponential backoff, up to 3 retries |
| ChromaDB | Local vector storage and retrieval | Local I/O | None | Re-embed on corruption |
The system tracks the status of every data source in a sources_status dict within the pipeline metadata. This dict is displayed in the sidebar so users can immediately understand if any data source was degraded or unavailable. Confidence scores are adjusted accordingly.
The frontend is built in Streamlit but uses no default Streamlit styling. All visible UI elements are rendered as custom HTML via st.markdown(..., unsafe_allow_html=True). This was a deliberate design choice to make the application look like a professional financial product — the default Streamlit look is generic and unsuitable for an equity research platform.
The UI system is built around a dark color palette (#0A0A0F primary background, #3B82F6 accent blue) and uses Inter as the primary typeface. Four Plotly charts are rendered with custom dark-theme templates matching the application's color system:
Agent execution progress is displayed using a custom animated HTML component with CSS pulse animation for running agents. The sidebar shows pipeline metadata: total execution time, per-agent latencies, data source status, and overall confidence bars per report section. A follow-up chat widget at the bottom of the report allows the user to ask questions grounded in the pipeline output.
A custom evaluation framework is implemented in src/eval/. The framework defines five metrics derived from the RAG and LLM evaluation literature, applied to a golden test set of three tickers: AAPL, NVDA, and MSFT.
| Metric | Definition | Threshold |
|---|---|---|
| Retrieval Precision@k | Among the top-k retrieved chunks, the fraction that contain at least one keyword from the relevant keyword set for the query | ≥ 0.60 |
| Retrieval Recall@k | Among the required source sections (MD&A, Risk Factors, Financial Statements), the fraction that appear in the top-k results | ≥ 0.80 |
| Faithfulness | For each monetary claim (dollar amount) in the report text, whether that figure appears (within 5%) in at least one retrieved chunk. Measured as fraction of claims grounded. | ≥ 0.90 |
| Numerical Accuracy | For each key financial metric (revenue, net income, gross margin, EPS, D/E ratio), whether the pipeline's value matches the golden ground truth within ±15% | All within ±15% |
| Earnings Surprise Accuracy | Whether the beat/miss direction (actual EPS vs. consensus estimate) is correct. This is a binary check — correct direction or not. | 1.00 (exact) |
Note on Metric Measurement
The metrics above represent design targets defined in the evaluation framework. Full automated measurement requires a complete pipeline run per golden ticker. Due to API rate limits encountered during development, manual inspection of pipeline outputs confirmed retrieval quality and numerical accuracy are within target ranges for AAPL and NVDA. The evaluation runner (python -m src.eval.runner --tickers AAPL NVDA MSFT) is included and functional for automated measurement with sufficient API quota.
| Challenge | Root Cause | Solution Implemented |
|---|---|---|
| LangGraph node name conflicts with state keys | Newer LangGraph versions raise a ValueError if a node name matches a TypedDict key. The node named "verification" conflicted with the state field verification. |
Renamed the node to "verify". Updated all references in graph.py, app.py, and components.py. |
| DCF unavailable for AAPL — insufficient data | The DCF model requires multi-year historical revenue growth rates from Alpha Vantage. With only 25 requests/day on the free tier, Alpha Vantage calls are frequently exhausted. The yfinance fallback does not always provide the multi-year fundamental history required for the DCF growth rate calculation. | DCF gracefully returns None with an explanatory message in the error log. The pipeline continues, sets DCF confidence to LOW, and the report section notes "DCF unavailable — insufficient historical data." The system design separates DCF availability from overall pipeline success. |
| JSON parse failure in Verification Agent | When the verification prompt is long (many risk flags + filing excerpts), Gemini's response may be truncated at the max_output_tokens limit, producing a syntactically invalid JSON string cut off mid-value. The raw snippet: "divergences": [ { "claim": "Total net sales for 2025 were $416.16 billion — the string is never closed. |
Implemented a multi-stage parser: (1) strip markdown code fences, (2) find the last valid closing bracket and truncate, (3) attempt json.loads(), (4) on failure, use re.search to extract a partial divergences array, (5) fall back to safe defaults. Separately, prompt length is now measured before sending and filing excerpts are truncated to keep the total under a character budget that leaves adequate output token headroom. |
| Gemini free-tier quota exhaustion across multiple keys | All gemini-2.0-flash free-tier quota (per Google Cloud project) was exhausted during development. New API keys from the same Google Workspace (Northeastern) account returned limit: 0, indicating the quota was not provisioned for that account type. |
Switched to gemini-2.5-flash, which has its own independent quota pool. Changed the GEMINI_MODEL config constant. Also added a get_script_run_ctx() check in _get_secret() to prevent Streamlit secrets access at module import time, which was causing set_page_config() ordering errors. |
| Parallel state merging race condition | When rag_citation and quant_analysis run in parallel and both attempt to update metadata["agent_latencies"] and error_log, one branch's updates would overwrite the other's. |
Used LangGraph's Annotated[dict, _merge_metadata] and Annotated[list, operator.add] on conflicting fields. The custom _merge_metadata reducer performs a dict merge rather than a replace, preserving both branches' contributions. |
| Plotly gauge steps reject 8-digit hex colors | Plotly's gauge step color property does not accept the #RRGGBBAA format used by our color system for opacity. |
Converted all gauge step colors to rgba(r,g,b,a) format inline. |
| Nested f-string triple-quote syntax error | f"""...{''.join(f"""...""" for ...)}...""" is invalid Python syntax — triple-quoted f-strings cannot be nested. |
Pre-computed the inner HTML string into a separate variable before the outer f-string. |
AlphaLens is explicitly not financial advice and is designed for educational purposes only. This disclaimer appears in three locations: the application header, each report section footer, and the follow-up chat system prompt (which instructs the model never to provide investment recommendations). The system's role is to surface publicly available information in an organized format, not to make trading recommendations.
All data used by AlphaLens comes from public domain or publicly accessible sources:
The system covers any publicly traded U.S. equity with an SEC filing — it does not favor any particular company, sector, or market cap. However, several inherent biases should be acknowledged:
The system is designed to communicate uncertainty explicitly rather than project false confidence. Every section of the report carries a confidence badge (HIGH/MEDIUM/LOW). The DCF model is always reported at MEDIUM confidence by design, acknowledging that all DCF models are sensitive to assumption choice. The Verification Agent's divergence list surfaces inconsistencies rather than smoothing them over. The goal is to give users the information needed to make their own judgment, not to substitute for it.
AlphaLens does not collect, store, or transmit any user data. Ticker searches are not logged. The local ChromaDB database stores only text excerpts from public SEC filings, not user information. The application is entirely self-contained within the user's local environment.
Known Limitations
Project Links
Web Page: https://rishisehgal.github.io/AlphaLens/
Live App: https://rishisehgal-alphalens.streamlit.app/
Video Demo: https://www.loom.com/share/36213fda55af4fd2829f19d726ef1957
GitHub: https://github.com/RishiSehgal/AlphaLens
AlphaLens demonstrates that the combination of Prompt Engineering and Retrieval-Augmented Generation, coordinated through a multi-agent state machine, can compress a complex professional workflow into an automated, reliable, and explainable system. The six-agent architecture mirrors how a real research team operates — each agent has a narrow, well-defined responsibility, and the LangGraph orchestrator manages their coordination and data flow.
The project's most technically meaningful contribution is the Verification Agent: using an LLM to cross-reference quantitative financial data against management's narrative claims — and flagging divergences with specific evidence — is a use of generative AI that goes beyond summarization. It produces a form of analysis that was previously only possible through careful manual reading and financial modeling expertise.
The system was built entirely on zero-cost free-tier APIs, demonstrating that meaningful generative AI applications do not require expensive infrastructure. The full stack — orchestration, embeddings, LLM inference, vector storage, data ingestion, and frontend — costs $0 to run, making it accessible as a learning project and deployable for individual use without ongoing cost.
The challenges encountered during development — LLM output reliability, API quota management, parallel state merging, and Streamlit rendering constraints — are representative of real engineering problems in production AI systems, and the solutions implemented (defensive parsing, rate limiters, annotated reducers, lazy secret loading) reflect industry-standard patterns for building robust LLM applications.