Capitol Quant Product Icon
Static Dataset • v1.0 Release (Feb 2026)

Bridge the Gap Between
Capitol Hill & Wall Street

Stop digging through PDF filings. Access 13,107 validated, high-value transactions (> $15k) from 2015 to 2026. The only dataset that transforms messy GovTrades disclosures into machine-readable financial intelligence.

The Market Inefficiency

Members of Congress consistently outperform the S&P 500. Yet, their trading data is buried in thousands of unreadable PDFs, making it impossible for retail investors to track in real-time. We fixed that. This archive democratizes 10 years of insider history.

Unlock 3 Levels of Analysis

Insider Trading Patterns

Identify suspicious timing. Which politicians bought defense stocks right before a war vote? Who sold tech before antitrust hearings?

Quantitative Backtesting

Don't guess—test. Simulate trading strategies based on specific politicians' history. "What if I copied Pelosi's trades with a 30-day lag?"

Source Verification

Every data point is linked to the original PDF. Verify the truth yourself. Includes stock ticker recovery for accurate mapping.

13,107
Total Records
11 Yrs
History (2015-2026)
3
File Formats
28 MB
Optimized Size
Official Data Manifest
FIGURE 1: DATASET INTEGRITY MANIFEST

Structured Intelligence

JSON Example
FIGURE 2: JSON DATA STRUCTURE

ETL Pipeline Logic

1
Ingestion & OCR Ingests raw filings from House/Senate. Uses OCR to parse handwritten PDF forms.
2
Normalization Maps messy descriptions to Tickers (MSFT). Standardizes Asset Types (Stock, Option).
DATA INTEGRITY LOG VALIDATED
[TX TYPE]
• Purchase 6493 (49.54%)
• Sale 6457 (49.26%)
[OWNERSHIP]
• Self 6441 (49.14%)
• Joint 4192 (31.98%)
• Spouse 2456 (18.74%)
[ASSETS]
• Stock (ST) 7639
• Other (OT) 1012
• Gov (GS) 998
[RANGE]
• $15k-50k 59.40%
• $50k-100k 17.56%
• > $500k 4.18%
// END OF REPORT //

Relational Architecture

Database Schema Architecture
FIGURE 3: ENTITY RELATIONSHIP

Why the 'politicians' folder?

Instead of repeating "Nancy Pelosi" 1,000 times in the transaction log, we use a normalized structure:

  • 1. The "Phonebook" (politicians.csv) Contains unique IDs, Total Trade Volume, and Transaction Counts for every legislator.
  • 2. Reporting Lag Calculation We calculate the exact delay (in days) between the Trade Date and Publication Date.
This metric identifies members who consistently delay disclosures vs. those who report instantly.