• Hi Guest Just in case you were not aware I wanted to highlight that you can now get a free 7 day trial of Horseracebase here.
    We have a lot of members who are existing users of Horseracebase so help is always available if needed, as well as dedicated section of the fourm here.
    Best Wishes
    AR

Hong Kong Speed Figures

Hi,
Using the same approach, here is the selection of bets for tomorrow at Happy Valley. The idea is to select the correct line corresponding to the horse — preferably the most recent one, unless the horse was impeded — on the same racecourse and at the same distance. Here is the betting selection for tomorrow.
 

Attachments

  • HV PRED 05112025.xlsx
    9.8 KB · Views: 4
Wednesday HV
Hi,
Using the same approach, here is the selection of bets for tomorrow at Happy Valley. The idea is to select the correct line corresponding to the horse — preferably the most recent one, unless the horse was impeded — on the same racecourse and at the same distance. Here is the betting selection for tomorrow.
pretty good start there La Cressonnière La Cressonnière , plenty of them placed, and a few winners 👍
 
I have little idea whether or not TPD is even close to being accurate or not but unless there's something on offer that is superior then i've no alternative but to take it as gospel.
My interest in sectionals is derived from reading form and recognising the anomalies that crop up and distort the logic.
Official ratings and the final times ought to be something that we might be able to rely on but because we see the sectionals now it has become clear they don't and like we've discussed many times O Outlander it is down to the early pace, whether this be too fast or more likely too slow.
If we take the two races mentioned from the wolves meeting and repeat the points already made those two races undermine the "form" and if trying to put numbers to the final times are to mean anything then we need to find some logic behind them.



A cl5 race & cl6 with ratings to match but the final times don't work out as we might think, this is purely down to the early pace of the cl5 race being too slow as demonstrated by the numbers.

At the end of the day as markfinn markfinn says it needs to go somewhere, while it is fairly easy to take the numbers into account from a form point of view that won't help if you want the numbers to tell you where to go, hopefully something can emerge but maybe we're on the wrong thread.
Sorry to trouble this thread again but just wanted to show some very different numbers from MARIS ANGEL yesterday.

 
Hi,
Here is what I’m trying to do.
XGBOOST MODEL
Based on the results from HK Happy Valley – September 2024.
A total of 3,766 performances were used.
Race distances included: 1000m, 1200m, 1650m.

The objective of the model is to predict which horse can finish in the top three, using the features from the Outlander dataset as well as additional variables, which we will discuss later.

The model’s metrics at Happy Valley are strong.
I carried out an initial test (TEST SET) on the races held at Happy Valley since September 10th, 2025 — a total of 21 races.
The model suggested 31 bets to place (prediction output > 0.45).
26 were successful, meaning the horses finished in the top three → ROI: 199%.
A file containing these results and the corresponding payouts is attached.
Good luck with that mate, sounds really promising. The one thing I have found with stuff like this is if you aren't careful you can find that a model might over-fit, or be prone to data leakage. What I mean by the latter is the model could be inadvertently "peeking" at the day's actual results and it can inflate the true performance to completely unrealistic levels. Not saying it's happened with those results but it's just something to watch for. Are you applying a train, test and validate approach, if I might ask?
 
Good luck with that mate, sounds really promising. The one thing I have found with stuff like this is if you aren't careful you can find that a model might over-fit, or be prone to data leakage. What I mean by the latter is the model could be inadvertently "peeking" at the day's actual results and it can inflate the true performance to completely unrealistic levels. Not saying it's happened with those results but it's just something to watch for. Are you applying a train, test and validate approach, if I might ask?
 
Hi,
The SHA TIN model is almost finished. 10 features. It will be active next week. I'm starting with 1 training set + 1 test set + 1 blank data set for validation of unknown performance. The results at this stage are very encouraging before moving into production.
AustinDillon75 AustinDillon75
XGBoost Binary Image Classification Model – Final Report
After **5-fold cross-validation with early stopping**, the model achieves:
- **Test AUC: 0.971**
- **Virgin (unseen) Data AUC: 0.968**
- **F1-score on virgin data: 0.827**
- **Cohen’s Kappa: 0.770**
No overfitting. Excellent generalization. Production-ready
Model Configuration**
| Component | Details |
|---------|--------|
| **Algorithm** | XGBoost Classifier |
| **Features** | 10 (including `SPDNW`, `400NW`, `SPDNW1`, `400NW1`) |
| **Target** | Binary (0/1) |
| **Training** | 5-fold CV + **Early Stopping** (patience=50, eval_metric='auc') |
| **Best CV Score** | **0.8998** (mean AUC across folds) |
| **Best Iteration** | 61,920 (boosting rounds) |
## **3. Performance Metrics**
| Dataset | Accuracy | AUC | F1 | Recall | Precision | Kappa | FPR |
|--------|----------|-----|-----|--------|-----------|-------|-----|
| **Train** | 0.968 | **0.998** | 0.936 | 0.997 | 0.883 | 0.915 | 0.041 |
| **Test** | 0.899 | **0.971** | 0.801 | 0.863 | 0.747 | 0.734 | 0.090 |
| **Virgin Set** (unseen) | **0.914** | **0.968** | **0.827** | **0.887** | **0.775** | **0.770** | **0.078** |

> **ΔAUC (Train → Virgin) = 0.030** → **Minimal overfitting**
## **4. Confusion Matrix – Virgin Set**
| | Predicted 0 | Predicted 1 |
|---|-------------|-------------|
| **Actual 0** | 189 (TN) | 16 (FP) |
| **Actual 1** | 7 (FN) | **55 (TP)** |
- **Only 7 critical misses** (FN)
- **High precision-recall balance**
 
HV 12/11/2025
I removed the SP variable from the learner. I find that it's too closely tied to the market price in the forecast. My next idea is to include Benter's calibrated probability. If anyone has used it before, I'd be interested in their feedback. PROB > 0.65
R1R2R3R4R5R6R7R9
DRACOTALENTS SUPREMOCALIFORNIA BANNERSETANTAROBOT LUCKY STARLOOKS OUTSTANDINGFIND MY LOVEFALLON
APOLAR FIGHTERLESLIETAKE ACTIONSTAR FIGUREISLAND BUDDYFUN ELITESOLEIL FIGHTER
PERFECTO MOMENTSAEROINVINCIBLEQUANTUM PATCHKING GLORIOSOKING MILESHUGE WAVE
REWARDING TWINKLE
BULL ATTITUDE
 
Excellent La Cressonnière La Cressonnière .

I think you are well ahead of me in knowledge. I know adding the SP or current price was what turned Bentner's model into profit. But I don't know his calibrated probability. Do you mean the weighting he gave to each input? Don't know if that is out there anywhere. Be interesting if it is.

Excellent work.
 
Back
Top