• Hi Guest Just in case you were not aware I wanted to highlight that you can now get a free 7 day trial of Horseracebase here.
    We have a lot of members who are existing users of Horseracebase so help is always available if needed, as well as dedicated section of the fourm here.
    Best Wishes
    AR

Scraper/Method to obtain Top Speed and RPR from Racing Post Website into Excel

D davevart The best way to interact with RI is to link the RI database tables to Access and then write your own queries to get to what you want.
I have attached a copy of the Class Ratings sheets I post every evening to show the outputs I am currently producing.
 

Attachments

  • Chesham Class Ratings Thursday 30 January 2020.xlsx
    822.1 KB · Views: 44
I can get the data I need comparatively easily using RI query but I am trying to save myself the subscription by screen scraping (given that I use none other of RI's other functionality
 
Results data could be scraped from the RP or Timeform sites if you wanted to go that route, personally I use HRB as 50p a day gets you the data without the hassle (you still have to check it fits your needs of course, HRB helpfully adds the rail moves to the race distance to give an 'actual distance run' that I have to then change back, for example, to ensure the correct standard time is used... such is life!)

If you have the ability and time to learn Python then you could indeed write it all yourself and save the RI sub, I would not personally consider paying anyone to do it for me as you are looking at a fairly big job time wise and I doubt it would be overly cheap - you would also have no clue as to why it went awry if hitting a bad bit of data or just presented you with an error message when running it, whereas if you've coded it yourself you'd have a very good chance of being able to sort it out.

Python isn't hard to learn, particularly if you have experience of other languages, and of course you are likely to find it comes in handy for other ideas that pop up - I started to write one program to compile ratings and I now have about 30 for all sorts of different things! It is, however, quite an investment in time.

Dave
 
Results data could be scraped from the RP or Timeform sites if you wanted to go that route, personally I use HRB as 50p a day gets you the data without the hassle (you still have to check it fits your needs of course, HRB helpfully adds the rail moves to the race distance to give an 'actual distance run' that I have to then change back, for example, to ensure the correct standard time is used... such is life!)

If you have the ability and time to learn Python then you could indeed write it all yourself and save the RI sub, I would not personally consider paying anyone to do it for me as you are looking at a fairly big job time wise and I doubt it would be overly cheap - you would also have no clue as to why it went awry if hitting a bad bit of data or just presented you with an error message when running it, whereas if you've coded it yourself you'd have a very good chance of being able to sort it out.

Python isn't hard to learn, particularly if you have experience of other languages, and of course you are likely to find it comes in handy for other ideas that pop up - I started to write one program to compile ratings and I now have about 30 for all sorts of different things! It is, however, quite an investment in time.

Dave
thanks for that...I've contacted a local Python tutor and given him my spec with a view to him helping me. Does HRB have RPR ratings?
 
I love horse racing and tinkering about with data, but will not part with the kind of money raceform , racing post , Timeform , Proform or Dataform charge, in my opinion it’s all way too expensive, I recently wanted to get the data together for jumps racing for a speed rating project , I use HRB £10 a month is within my limit to pay, I know you said you need the RPR, I don’t know if you want RPR in results or today’s cards but if it’s results , if it’s just results this scraper put up by a poster on here is pretty good, I had trouble getting it to work at first but that is because I’m a bit dense with this type of thing.I scraped years of results and they contained all the RPR and Topspeed ratings
I deleted them because there wasn’t accurate enough distance data for my needs but if it’s just general results and ratings you need it may be ideal for you and freely posted by 4a47 4a47

 
Hi Laurent,
sorry but I don't actually maintain programs that I don't use myself, and I've had a PC change and a total software reinstall since producing most of the various scrapers people asked me for. I have two files currently stored (still) on Dropbox that provide scraping code, they are linked below. giusep.zip is of course the one I provided to Giuseppe_esq, what it did I, quite frankly, don't remember! The other grabs RP results data - feel free to use either if they work okay for you. Download and unzip them into a convenient folder, double click the exe files inside them once decompressed to run - they'll pop up a cmd window to run in and close it on exit. Output files go to the folder the programs are run in.

Please note that I do not maintain these, if they run, great, happy to oblige, but if they no longer work then I don't start recoding etc to make them work - I just have far too much on my own plate!


Dave
Hi Dave
Good to see these still work. i have some programming experience although not much Python, could you point me to the place when I could alter these to include more columns? or is it not that easy?
Many thanks
Nick
 
Hi,
sorry but you'd have to have the source code, then delve into the coding of the website pages - whilst I don't mind making my exe code available for those who have a use for it I don't want to get into discussions of how the programs work, and from past experience if I share source code I end up doing rather more consultancy than I wish. If you want to get into web scraping with Python there are lots of web resources on the subject - the two things I found difficult were getting the RP site to accept a request to download a page from their site, which took me a day or two to figure out, and after that working through the coding of those pages to identify repeating strings that identified the data items I wanted - some Python libraries like 'beautiful soup' can help with that sort of thing, being of the generation that sat looking at hexcode 'peeking and poking' to find out how things were put together my approach is rather more old fashioned.

Dave
 
Hi,
sorry but you'd have to have the source code, then delve into the coding of the website pages - whilst I don't mind making my exe code available for those who have a use for it I don't want to get into discussions of how the programs work, and from past experience if I share source code I end up doing rather more consultancy than I wish. If you want to get into web scraping with Python there are lots of web resources on the subject - the two things I found difficult were getting the RP site to accept a request to download a page from their site, which took me a day or two to figure out, and after that working through the coding of those pages to identify repeating strings that identified the data items I wanted - some Python libraries like 'beautiful soup' can help with that sort of thing, being of the generation that sat looking at hexcode 'peeking and poking' to find out how things were put together my approach is rather more old fashioned.

Dave
No worries, I know what you mean, I started writing simple SQL for someone on a racing database and have ended up spending many hours writing complex stored procedures to create huge databases for back testing all sorts of scenarios.
Wasn't sure if your was just a case of replicating a few lines to import extra columns, obviously not and can link what i have from the giuseppe to my existing feeds, thanks for the original pgm and responding.
Nick
 
Results data could be scraped from the RP or Timeform sites if you wanted to go that route, personally I use HRB as 50p a day gets you the data without the hassle (you still have to check it fits your needs of course, HRB helpfully adds the rail moves to the race distance to give an 'actual distance run' that I have to then change back, for example, to ensure the correct standard time is used... such is life!)

If you have the ability and time to learn Python then you could indeed write it all yourself and save the RI sub, I would not personally consider paying anyone to do it for me as you are looking at a fairly big job time wise and I doubt it would be overly cheap - you would also have no clue as to why it went awry if hitting a bad bit of data or just presented you with an error message when running it, whereas if you've coded it yourself you'd have a very good chance of being able to sort it out.

Python isn't hard to learn, particularly if you have experience of other languages, and of course you are likely to find it comes in handy for other ideas that pop up - I started to write one program to compile ratings and I now have about 30 for all sorts of different things! It is, however, quite an investment in time.

Dave
Hi Dave I have come into this thread fairly late but could you advise what HRB is (Horse Racing Board maybe?) and what you get for 50p per day?

Many thanks
 
As ArkRoyal ArkRoyal said, it's Horseracebase.com - take a look at the site, it has results, cards, all sorts of ratings and ways to filter form, sections for you to input system parameters, far too much for me to cover here frankly. The chap running it charges £10 a month for a basic membership (that is sufficient for most I would think) or for £15 a month you find the number of systems you can run on it (that it emails qualifiers to you for) go from 'more than you need' to 'very much more than you need'!

Extremely good value, and advert free.
Dave
 
As ArkRoyal ArkRoyal said, it's Horseracebase.com - take a look at the site, it has results, cards, all sorts of ratings and ways to filter form, sections for you to input system parameters, far too much for me to cover here frankly. The chap running it charges £10 a month for a basic membership (that is sufficient for most I would think) or for £15 a month you find the number of systems you can run on it (that it emails qualifiers to you for) go from 'more than you need' to 'very much more than you need'!

Extremely good value, and advert free.
Dave
Many thanks Dave I will take a look.
 
davejb davejb, Hello, Sorry to jump in this thread like this.... Do you know how to make a script to scrape RacingPost, Sportinglife, Betfiar/Coral & Timeform? or even where I could find information of how to do it.

I have noticed I could scrape from exel, but all sites use a new URL daily and I'm not sure if it would work as automatically updated daily.

Is it possible? (y)
 
T Theloantrader I suspect you need an Excel guru - which isn't my forte.

Scraping RP, Timeform etc depends largely on what you want to scrape from them - stuff they hide (like Timeform's ratings that are reserved for paying customers) can't be accessed unless you're into hacking, but the easily accessed stuff like racecards and results are generally available in exchange for time and effort in coding.

Somebody posted an Excel based results scraper on here the other day, check the other 'scraper' threads - I think it was on the 'which language is best for...' thread. (I'll see if I can find it for you after posting this).

To be absolutely honest other than haunting the scraper threads the only real option is to learn how to do it yourself, at least then you'd be able to fix things when sites get updated.

For your bit regarding how they change the URL's daily, the 'trick' (if you can call it that) here is that there will be a home page for a site that then links to the information you want, and that page will be constant. For example, scraping RP cards my program simply asks for the date (and I have coded in a 'press return if you just want tomorrow', as after all that is the day I am usually after) and I know the URL for the 'master' cards page will be at
https://www.racingpost.com/racecards/+the date I just entered, so for tomorrow that's
XXXhttps://www.racingpost.com/racecards/2020-07-20 - note the XXX is dummy stuff to stop the forum software turning the address into a link!

My program goes to that one page, reads through the text of the page to find the addresses of the individual race cards, then goes to each in turn and downloads them, stripping the information from each page and storing it into a spreadsheet. How others do the job I haven't a clue, but that's how I do the job in Python.

Each site, each job within a site, takes a bit of analysis to figure out formats and where/how to identify information, and then a fair bit of coding to make it all work. If you want to code the sort of way I do, then first you need to learn to program. For an Excel based sheet, you need somebody who uses Excel to do what I use Python for.

Dave

ps - I was right, it's post 8 in the 'preferred language for scraping' thread that has a link to an RP results scraper.
 
Last edited:
Thanks Dave

Now I have this data I would like to sort each of the ratings columns in order

I have been doing this using =IF(AND(B2=B1,C2=C1),G1+1,1) That's fine it does sort them by track and time... However it goes 1-2-3-4-5-6-7

and often there is more than 1 horse with the same rating so what I really would like to do is sort so that if there are say these ratings then it
gives the same rating the same number and counts what the next rating should be like below instead of giving the first 112 as rating 1 and the 2nd 112 as rating 2 which is what the formula I am using does... I've tried to google it but can't find anything...If anyone already does this with some ratings or has an Idea how I can make it work...any help would be very much appreciate...and make my life easier than doing it manually :):):)

112 1
112 1
110 3
109 4
109 4
109 4
108 7
107 8
90 9
90 9
87 11
0 0
0 0
 
Hi Dave, I get my data from bet turtle and I first sort the data by cse and time and saddle. That gives me my 1 to 12 say for a 12 runner race. Then copy and paste into a new sheet the column. Then i insert new columns beside the data I want to rank. In my case i rank the form, speed, trn cse + type, trn jky, jky cse , jky type, and the sire cse and type. Then i can get the top ranked for each once i have sorted them by catagory and inserted the copied saddle numbers. hope this helps Jock
 
Back
Top