Overview
Oracle is a custom-trained machine learning system that predicts sale outcomes for incoming leads. Unlike simply calling OpenAI or using off-the-shelf ML services, Oracle is trained entirely on Midtown's proprietary historical data—making it uniquely tuned to our specific market, customer base, and sales patterns.
This isn't "we added AI to our product." This is building a custom ML model from scratch that directly impacts revenue through intelligent lead routing.
What Oracle Does
Sale Prediction
Given a new lead, Oracle predicts the probability of a successful sale:
- Analyzes lead characteristics against historical patterns
- Returns a probability score (0-100%)
- Factors in timing, location, product interest, and more
Rep Matching
Oracle ranks which sales rep has the best chance of closing each lead:
- Considers rep's historical performance in similar scenarios
- Factors in territory familiarity, product expertise, workload
- Enables intelligent lead routing beyond simple round-robin
Lift Calculation
Measures the improvement Oracle provides over baseline:
- Compares Oracle-routed leads vs. random assignment
- Quantifies the revenue impact of intelligent routing
- Provides confidence metrics for predictions
Training Data Sources
Oracle's power comes from the richness of its training data:
Historical Appointment Data
- Thousands of historical appointments with known outcomes
- Sale price, close rate, time-to-close patterns
- Rep performance across different scenarios
Geolocation Data
- Property location and neighborhood context
- Proximity to branches and service areas
- Local market characteristics
Soft Credit Data
- Financial signals indicating purchase capacity
- Risk scoring for financing considerations
- Correlated with historical close rates
Market Data
- Local trends and seasonality
- Competitive landscape factors
- Economic indicators
Technical Architecture
Feature Engineering Pipeline
Raw data is transformed into ML-ready features including base lead score, neighborhood context, credit band normalization, days since inquiry, product demand indices, rep-territory fit scores, rep product expertise, and seasonality factors.
Model Training
- Algorithm: XGBoost gradient boosting
- Validation: Time-based cross-validation (no data leakage)
- Metrics: AUC-ROC, precision/recall at various thresholds
- Retraining: Periodic retraining as new data accumulates
Production Integration
Oracle exposes a REST API consumed by Evergreen. The CRM sends lead ID and extracted features, and receives back a probability score (0-100%), ranked list of recommended reps, and a lift calculation showing improvement over baseline.
Portfolio Significance
Oracle demonstrates end-to-end ML engineering:
- Data Engineering: Building ETL pipelines for diverse data sources
- Feature Engineering: Transforming raw data into predictive signals
- Model Training: Selecting algorithms, tuning hyperparameters
- Validation: Proper time-based splits, avoiding data leakage
- Deployment: Production API serving real-time predictions
- Integration: Seamless connection with the CRM workflow
- Monitoring: Tracking model performance over time
This isn't a weekend project or a tutorial exercise—it's a production ML system that influences real business decisions and measurably impacts revenue.
Results
Oracle's predictions directly improve sales outcomes:
- Intelligent Routing: Leads matched to reps most likely to close
- Priority Optimization: High-probability leads get faster attention
- Rep Development: Insights into what makes successful matches
- Continuous Learning: Model improves as more data accumulates
Lessons Learned
Building Oracle taught valuable lessons about production ML:
- Data quality matters more than algorithm choice: Clean, representative data beats fancy models
- Feature engineering is the real work: Most time spent understanding and transforming data
- Time-based validation is critical: Prevents overly optimistic performance estimates
- Integration is half the battle: The best model is useless if it's not embedded in workflows

