Hi,
The SHA TIN model is almost finished. 10 features. It will be active next week. I'm starting with 1 training set + 1 test set + 1 blank data set for validation of unknown performance. The results at this stage are very encouraging before moving into production.
AustinDillon75
XGBoost Binary Image Classification Model – Final Report
After **5-fold cross-validation with early stopping**, the model achieves:
- **Test AUC: 0.971**
- **Virgin (unseen) Data AUC: 0.968**
- **F1-score on virgin data: 0.827**
- **Cohen’s Kappa: 0.770**
No overfitting. Excellent generalization. Production-ready
Model Configuration**
| Component | Details |
|---------|--------|
| **Algorithm** | XGBoost Classifier |
| **Features** | 10 (including `SPDNW`, `400NW`, `SPDNW1`, `400NW1`) |
| **Target** | Binary (0/1) |
| **Training** | 5-fold CV + **Early Stopping** (patience=50, eval_metric='auc') |
| **Best CV Score** | **0.8998** (mean AUC across folds) |
| **Best Iteration** | 61,920 (boosting rounds) |
## **3. Performance Metrics**
| Dataset | Accuracy | AUC | F1 | Recall | Precision | Kappa | FPR |
|--------|----------|-----|-----|--------|-----------|-------|-----|
| **Train** | 0.968 | **0.998** | 0.936 | 0.997 | 0.883 | 0.915 | 0.041 |
| **Test** | 0.899 | **0.971** | 0.801 | 0.863 | 0.747 | 0.734 | 0.090 |
| **Virgin Set** (unseen) | **0.914** | **0.968** | **0.827** | **0.887** | **0.775** | **0.770** | **0.078** |
> **ΔAUC (Train → Virgin) = 0.030** → **Minimal overfitting**
## **4. Confusion Matrix – Virgin Set**
| | Predicted 0 | Predicted 1 |
|---|-------------|-------------|
| **Actual 0** | 189 (TN) | 16 (FP) |
| **Actual 1** | 7 (FN) | **55 (TP)** |
- **Only 7 critical misses** (FN)
- **High precision-recall balance**