• Hi Guest Just in case you were not aware I wanted to highlight that you can now get a free 7 day trial of Horseracebase here.
    We have a lot of members who are existing users of Horseracebase so help is always available if needed, as well as dedicated section of the fourm here.
    Best Wishes
    AR

Scraper/Method to obtain Top Speed and RPR from Racing Post Website into Excel

Evening kdw,

Hope you've had a good week - roll on the weekend.

You can get rpr and ts in the results - and racecards - from GitHub - 4A47/rpscrape: Scrape horse racing results data to CSV.

The results are csvs - the racecards are in json. Look at Select Data > Get Data > From File > From JSON in Excel.

Or, convert the json to csv (I'd recommend python) or look for a json flattener in Google,
thanks for this. i have started a basic python programming course and have github installed. this is awesome. i just have to try and work through this and solve. Thanks
 
Evening kdw,

Hope you've had a good week - roll on the weekend.

You can get rpr and ts in the results - and racecards - from GitHub - 4A47/rpscrape: Scrape horse racing results data to CSV.

The results are csvs - the racecards are in json. Look at Select Data > Get Data > From File > From JSON in Excel.

Or, convert the json to csv (I'd recommend python) or look for a json flattener in Google,
the example shown is for results. i m looking for the racecards so ideally im looking for date time meeting horse trainer jockey ts and rpr to be produced into a csv file and then i can analysis the data. KDW
 
Thanks

If I have the data like this where do I put the formula I tried putting it in the cell after RPR but nothing happened

CourseTimeNameORTSRPR
Goodwood01:10Declared Interest8371103
Goodwood01:10Separate9579102
Goodwood01:10Secret Return7977101
Goodwood01:10Im Available7781101
Goodwood01:10Wasaayef9391100
Goodwood01:10Angel Power9094100
Goodwood01:10Tomorrows Dream8471100
Goodwood01:10Agent Of Fortune7659100
Goodwood01:10Queen Of Silca7793100
Goodwood01:10Hateya957898
D daveat91 do you have an program that generates a file like the above ie runners or ts and rpr? thanks
 
Your luck is in L Laugro1968 because the problem is simply the same one I solved last night when I found the cards would no longer download - RP have made a tiny change to their page code and I had to change the programs I use to suit.... so a quick look (I found a copy of the original code) and I realised this was a problem I already fixed.

Delete the existing files from the first download, then follow the install process with this replacement version and you should be okay. I'm attaching a copy of the download i got a few minutes ago using this program for reference.


Dave
davejb davejb do you have a problem that produces a csv file of runners and ts and rpr? thanks
 
K kdw - The thread you took that post of mine from is 2 years old mate, if I still have the code somewhere then it hasn't been updated in a long while, clearly the files I uploaded to Dropbox are no longer there. Your best bet is if one of the people I did this for back then reads this thread and still has a copy they can forward to you. I hope you'll understand that I don't 'maintain' scrapers for others, my programs are for my own use and while I've been able to alter some of them to suit others over the years once they are gone they're gone.

Having had a good root round let's see if this works to save further hassle -



Hopefully this version that I produced for another guy on here will work for you, it did when I tried it although there are one or two bits of data that clearly aren't okay... 2 or 3 of the runners tomorrow have a topspeed rating of 6 inches or some such, but the vast majority look okay. If this works for you fine, but it is old code so no guarantees.

Dave
 
Better if you address that G gbettle to ensure (s)he sees it - but from past experience of this sort of thing the answer to your question is probably 'not a thing'.
(Part of why I'm reluctant to get too involved when people come on here and ask for scrapers and the like).
Dave
 
K kdw - The thread you took that post of mine from is 2 years old mate, if I still have the code somewhere then it hasn't been updated in a long while, clearly the files I uploaded to Dropbox are no longer there. Your best bet is if one of the people I did this for back then reads this thread and still has a copy they can forward to you. I hope you'll understand that I don't 'maintain' scrapers for others, my programs are for my own use and while I've been able to alter some of them to suit others over the years once they are gone they're gone.

Having had a good root round let's see if this works to save further hassle -



Hopefully this version that I produced for another guy on here will work for you, it did when I tried it although there are one or two bits of data that clearly aren't okay... 2 or 3 of the runners tomorrow have a topspeed rating of 6 inches or some such, but the vast majority look okay. If this works for you fine, but it is old code so no guarantees.

Dave
thanks Dave. i ran it and it produced a csv file with headings but no data. no worries. I ll carry on by hand. I still have to do the hard work when i have the data
 
Okay, it worked the day I compiled it, but it looks like the abandoned Southwell meeting tomorrow caused problems - the program tries to download cards for an abandoned meeting basically, and then falls over when it can't find the race information.

I've had a go at it (even though I said I wouldn't) and got it to skip that problem, so there's a new link to the same program with the edit... overwrite the old version with this one therefore and you ought to be okay. I'll attach the resultant file for tomorrow that I get when running it so you can check you get the same.

I won't be doing fixing and stuff going forward though, I don't use this program myself and be warned that when the R Post does something different with its web pages, as it does sometimes, or when the cards have a load of silly stuff added (pony races, gazelle sprints, whatever) it may not cope with the silly stuff, so it'd be a good idea to learn Python!

Good luck,
Dave
 

Attachments

  • RP_racecards_2021-11-12.csv
    16.6 KB · Views: 54
Okay, it worked the day I compiled it, but it looks like the abandoned Southwell meeting tomorrow caused problems - the program tries to download cards for an abandoned meeting basically, and then falls over when it can't find the race information.

I've had a go at it (even though I said I wouldn't) and got it to skip that problem, so there's a new link to the same program with the edit... overwrite the old version with this one therefore and you ought to be okay. I'll attach the resultant file for tomorrow that I get when running it so you can check you get the same.

I won't be doing fixing and stuff going forward though, I don't use this program myself and be warned that when the R Post does something different with its web pages, as it does sometimes, or when the cards have a load of silly stuff added (pony races, gazelle sprints, whatever) it may not cope with the silly stuff, so it'd be a good idea to learn Python!

Good luck,
Dave
wow. awesome thank you
 
Hi davejb davejb I have just stumbled across your posts, you have done some amazing work scraping the racing post! I was wondering if you could give me some help please? I have a list of horses (700+) that I need to pull some information for. I need to know the Trainer, Current handicap mark, highest ever handicap mark, number of furlongs it runs over (average from previous runs) Its all available on the card on RP. I have done something similar before in c# for football but I'm struggling with the RP. Any help appreciated.
 
Sorry,
you might not have noticed but I stopped the programming/speed figure work a year ago and have no intention of restarting.
I would, generally, suggest that anyone wanting to scrape data either find somebody who is currently doing so and willing to share their work - I think I've seen a few using Excel to scrape data elsewhere on the forum over the past year or two - or go the whole hog and learn a programming language, such as Python, which is the one I used and is available free online, as ultimately if you write it yourself then you'll have some idea of how to fix it when it stops working!

All I did was to identify one or more website pages that contained the information I wanted, make note of the page address(es) I wanted, then download the whole of each page to a text file/collection of text files (a file per web page). It was then a matter of writing simple text search algorithms that would identify sections of text containing the information I wanted, so for example a line that contained the name of a horse's jockey would have some formatting info followed by the words '< horse-jockey == > Fred Bloggs' and my program would search for the horse-jockey phrase and copy whatever followed it. Copy all the bits of text into one place for final editing, then just run through it. You simply (!) learn how a page is formatted, identify phrases that identify data items, strip them out then do what you want with the result. It just looked simple because my program did a lot of text shuffling that went on in the background.

I don't even have the code on my PC any more, it's all off on some hard drive somewhere just on the offchance I ever load Python back on and give it another go.... estimate for that being 'hell freezes over plus one month'.

I hope you manage to figure it all out. The racecard info from RP will give you the data you want quite happily, it's just a matter of downloading rhe pages and parsing the text, plus a bit of aggro when things don't quite go as planned.....
Dave
 
I have a list of horses (700+) that I need to pull some information for. I need to know the Trainer, Current handicap mark, highest ever handicap mark, number of furlongs it runs over (average from previous runs) Its all available on the card on RP. I have done something similar before in c# for football but I'm struggling with the RP. Any help appreciated.
You can get the information you require from my speed figures lists, the current lists go back to 2020.


If you need to know what handicap marks they ran to in a race add 17 lbs to the speed figure for the flat and 28 lbs for the NH.

Base rate flat = 9-0 (100)
Base rate NH = 11.0 (130)

All my speed figures are rail adjusted.

Mike.
 
Back
Top