• Hi Guest Just in case you were not aware I wanted to highlight that you can now get a 20% discount on Inform Racing.
    Simply enter the coupon code ukbettingform when subscribing here.
    We have a lot of members who are existing users of Inform Racing so help is always available if needed.
    Best Wishes
    AR
  • Hi Guest Just in case you were not aware I wanted to highlight that you can now get a free 7 day trial of Horseracebase here.
    We have a lot of members who are existing users of Horseracebase so help is always available if needed, as well as dedicated section of the fourm here.
    Best Wishes
    AR

Scraping Utility for Those New to Scraping

rpjd99

Colt
For those not sure about scraping, the attached workbook uses two VBA macros to assist in starting. I used these at the beginning to find out what exactly I needed from a web page and then, having identified what I wanted, to test my scraper with the page to ensure that I had everything correctly worked out.

It uses the standard(?), simple method of scraping, with no bells and whistles such as Regular Expressions, and you'll then need to parse what you scrape.

1. The first ("Import_All_Elements") takes in all elements of a web document, so you can identify the exact elements you need and the conditions for taking them in.
2. Having identified what you want and their conditions for import, the second macro ("Import_Selected_Elements") gives an example of how to apply the conditions and exclusions required. Bear in mind that it is only an example.

Depending on how much the web page contains, it may take some time for the first macro to run. You'll also have to write into the macro, where indicated, the URL of the page you want to scrape.

Good luck!

Ray
 

Attachments

  • Scraper_Utility.xlsm
    21.7 KB · Views: 363
excellent! I will endeavour to master this superior technology once I manage to work out the instructions on my manual version. :doh:


Draper-49307-100mm-Window-Scraper-IMGDRA49307.jpg
 
Firefox never ran correctly for me on my set up: Mac with Windows running as a virtual machine in Parallels. Whenever I clicked on a Firefox link, hundreds of browser windows would open and I had to kill each instance separately. So I haven't bothered using it for a couple of years now. But I'm a bit fed up with Apple for several reasons, so I'm going back to a Windows PC for my next purchase. I'll still use the Mac, as there are apps I use that are Mac-only, but my main box will be the PC and the Mac will be just part of my network.

So no, I wasn't aware of Firebug, but it looks interesting - quite like Outwit, which I have. Outwit is a standalone or add-in for Firefox, but it's pay-for, whereas Firebug is free. However, my main purpose of putting up the macro was to get those interested to examine for themselves the make-up of a web page and to make it their own. I use another method for scraping that's more efficient, which I'll put up here as soon as I fully understand it.

Ray
 
Hello

Id be interested in scraping Patternform website

Theres two different areas though, one is a standard web page with date, and one is a page with drop downs and selections/filters.

Im guessing Ive no chance on the latter!

Any help with this website appreciated

RW
 
Patternform is very difficult to get the data off.

I have a spreadsheet that will pull the information off the site giving ratings for 6 months, 3 months and 1 month.

It is some time since I have used it but if you want a copy, just pm me your email addy and I will send one over.

If I put it online it is liable to lose the macros that operate the sheet.
 
I have posted spreadsheets to the forum with macros that have been fine.
There is an upload attachment facility at the bottom of the reply window.
Might save you having to send it numerous times :D
 
Hopefully that has sorted it. :)
 

Attachments

  • Copy of Sumuwin_PatternFormProgram (3).xls
    80.5 KB · Views: 199
....and how to work it

When you open up the spreadsheet it will take about 10 seconds to find the Patternform site.


A
There will be a window appears in front of you which is the Patternform home page. Click on the day from which you want the ratings. Then click on the meeting you want.
To the right of the sheet are four buttons

GET DATA
GO BACK
GO FORWARD
HOME

Click on get data and the spreadsheet will get the data for that meeting and place it on sheet 2 which is named SPEED DATA.

YOU MAY GET A PRINT MESSAGE, JUST CLICK CANCEL.

A message box will appear when the meeting is complete.

Click the home button and start again with the next meeting, starting again from A

Once you have completed the data you want, it will all be on the SPEED DATA PAGE.
 
Funnily enough I invited an old 'mate' to join the forum last week who used to post as sumuwin :D and he is online now!
 
Thanks Roy

May be just me but I get no webpage appear, just says 'Cannot Display' etc

My brain is going to hurt by end of today i know it lol
 
Hi Roger,

I replied to your PM and to your post about where to put the URL, but the latter seems to have disappeared into the ether. Anyway with the URL, replace "http://www.SITE.com/FOLDER/PAGE" with the URL you want, making sure to leave the quote marks.

To get the web page to appear, change ".Visible = False" to ".Visible = True".

Off the top of my head, I cannot figure out why you should be getting a "Cannot Display" error, unless the URL was not correctly inserted. I was able to load the page with no problem, which I had to do to answer your PM.

Ray
 
rpjd99 said:
Hi Roger,

I replied to your PM and to your post about where to put the URL, but the latter seems to have disappeared into the ether. Anyway with the URL, replace "http://www.SITE.com/FOLDER/PAGE" with the URL you want, making sure to leave the quote marks.

To get the web page to appear, change ".Visible = False" to ".Visible = True".

Off the top of my head, I cannot figure out why you should be getting a "Cannot Display" error, unless the URL was not correctly inserted. I was able to load the page with no problem, which I had to do to answer your PM.

Ray

I was getting Ray and Rob mixed up!

Im trying to figure out all the info, but as for the patternform one, Im getting no page appearing as yet

Let me know if anyone else had issues or it'll be just me
 
All sorted , it just started to work!

So this sorts the filter into various months etc

Is it easy to modify and make it use filters of your choice, or will it be a host of VB programming etc
 
arkroyal said:
Funnily enough I invited an old 'mate' to join the forum last week who used to post as sumuwin :D and he is online now!


I think thats who must have sent me it all those years ago. I wonder if he is the same person :cool:
 
Back
Top