• Hi Guest, The software has been updated but I have not had a chance to tweak anything yet.
    It took longer than I had hoped, so I just turned it on and hope everything is OK
    If you spot anything that does not look rigfhyt then please let me know.
    Ark Royal
  • There seems to be a problem with some alerts not being emailed to members. I have told the hosts and they are investigating.
  • Hi Guest, If you are seeing that Lurker has appeared under your name then please take a look here to see why. AR

Scraping RP Naps Table with python (Idiots Guide)

Mate of mine wants to start using python for scraping various things from websites , so I said read a few examples and watch a few vids off youtube then give it a go with the RP naps table which is easily obtainable.
The reason I said the naps table was RP dont like you scraping it and have put in little things that would make it a bit difficult for a newbie to scrape and would keep denying access.
Fair play he gave it several goes before he asked for an idiots guide to see where he was going wrong.
To cut a long story short Ive had a bit of free time today so knocked a guide up for him and thought I'd put it on here as well as there doesnt seem to be any basic guides on how to with Python on here, If it helps 1 person to get into scraping with python then its job done .

RP.JPG

The out [29] is the return from line 35 where the output is proofed(you would delete that line after proofing)
Lines 6 & 8 are the important lines as RP will deny access if you just try and get access through via the . get (url) route.

Heres a screenshot of the csv file output it gives in Excel.

screen.JPG
 
Last edited:

Horseplayer

Gelding
I'll have a look after, it should be a bit simpler than that no need for bs, Pandas should tidy it all up with read_html. In fact it's sacrilege going to excel from a DataFrame
 
I'll have a look after, it should be a bit simpler than that no need for bs, Pandas should tidy it all up with read_html. In fact it's sacrilege going to excel from a DataFrame
The object was not whether you need bs or read_html in pandas,
it was to get on the RP naps page in the first place without getting a response 403 or access denied .
 
Top