FX Market Edge Prediction - Machine Learning Research

Project Overview

This research project represents a comprehensive investigation into the predictability of USD pip movements in foreign exchange markets using state-of-the-art machine learning techniques. The study employed a hybrid architecture combining Temporal Convolutional Autoencoders (TCNAE) with LightGBM gradient boosting to analyze 3 years of hourly market data across 24 major currency pairs.

Key Finding: Despite sophisticated methodology and rigorous experimental design, no statistically significant market edge was discovered, providing valuable negative evidence that validates the efficient market hypothesis for hourly FX movements.

Python PyTorch LightGBM OANDA v20 API Docker Pandas/NumPy Temporal CNNs Financial ML

Technical Architecture

Complete 3-stage ML pipeline: TCNAE autoencoder → LightGBM models → USD predictions

Core Components

TCNAE (Temporal Convolutional Autoencoder): 537K-parameter model compressing 4-hour sequences into 120-dimensional latent representations
LightGBM Models: 48 specialized gradient boosting models (2 per instrument) for pip magnitude and direction prediction
Cross-Instrument Context: 24×5 feature tensor enabling information sharing across currency pairs
USD Conversion Engine: Proper financial mathematics for actual trading value calculations

Dataset & Methodology

224,955

Total Samples

24

Currency Pairs

3 Years

Historical Data

32,880

Test Predictions

Data Sources & Quality

OANDA v20 API: Live trading environment data ensuring market realism
Hourly Frequency: Optimal balance between signal and noise for technical analysis
Rigorous Cleaning: 50% retention rate after aggressive quality filtering
Temporal Validation: Strict chronological splits preventing lookahead bias

Results & Performance

Comprehensive performance analysis showing no statistically significant edges discovered

0.02%

Average Correlation

50.1%

Direction Accuracy

$101.80

Average RMSE

5.83%

Best Correlation (USD_CAD)

Key Findings

Direction accuracy clustered around 50% (random baseline) across all instruments
Correlation coefficients remained below 10% for all currency pairs
Both log returns and direct USD pip training approaches converged to identical conclusions
Model uncertainty appropriately reflected market unpredictability

Technical Innovation

Novel Contributions

Dual Model Architecture: Separate regression and classification models for pip magnitude and directional prediction
Latent Caching System: Optimized training pipeline reducing computational overhead by 70%
USD-Centric Design: Direct financial value calculation enabling economic interpretation
Production-Grade Implementation: Docker containerization, comprehensive error handling, and OANDA live API integration

Methodological Rigor

Temporal validation preventing data leakage
Cross-instrument feature engineering with causal constraints
Statistical significance testing across multiple timeframes
Comprehensive ablation studies validating architectural choices

Scientific Value

Research Contribution

This study provides crucial negative evidence using modern ML techniques, serving as a methodological template for rigorous financial prediction research. The honest reporting of unsuccessful edge discovery attempts contributes valuable knowledge to the field by demonstrating that sophisticated technical approaches cannot overcome fundamental market efficiency.

Industry Implications

Market Efficiency Validation: Strong evidence supporting efficient market hypothesis at hourly timeframes
Methodological Framework: Reusable architecture for systematic market prediction research
Technical Benchmarking: Establishes baseline performance expectations for FX prediction models
Risk Management: Demonstrates importance of skeptical approach to technical trading strategies

Technical Skills Demonstrated

Deep Learning Architecture Financial Data Engineering Production ML Pipelines Statistical Validation Time Series Analysis API Integration Docker Containerization Scientific Computing Research Methodology Code Quality & Testing

Repository Access

The complete codebase, experimental results, and documentation are available on GitHub. The repository includes trained models, comprehensive analysis reports, and visualization tools for full reproducibility of the research findings.

🔗 Explore the Complete Repository

Repository Features: Production-ready code, comprehensive documentation, experimental results, trained models, visualization suite, and Docker deployment configuration.