• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.
  • Hi Guest, The forum software has been updated!
    Please let me know if you spot anything that isn't working as expected.
    I hope you find your way around OK!!!
    Have a great Christmas Guest.
    Best Wishes
    Ark Royal
  • There seems to be a problem with some alerts not being emailed to members. I have told the hosts and they are investigating.
  • Sorry for any problems caused to those that were trying to access the forum earlier today

    There was an authentication issue between the webservers and the database servers we are on!
    Many Thanks to those that emailed me to let tme know of the issue.

    AR

Scraping RP Naps Table with python (Idiots Guide)

#1
Mate of mine wants to start using python for scraping various things from websites , so I said read a few examples and watch a few vids off youtube then give it a go with the RP naps table which is easily obtainable.
The reason I said the naps table was RP dont like you scraping it and have put in little things that would make it a bit difficult for a newbie to scrape and would keep denying access.
Fair play he gave it several goes before he asked for an idiots guide to see where he was going wrong.
To cut a long story short Ive had a bit of free time today so knocked a guide up for him and thought I'd put it on here as well as there doesnt seem to be any basic guides on how to with Python on here, If it helps 1 person to get into scraping with python then its job done .

RP.JPG

The out [29] is the return from line 35 where the output is proofed(you would delete that line after proofing)
Lines 6 & 8 are the important lines as RP will deny access if you just try and get access through via the . get (url) route.

Heres a screenshot of the csv file output it gives in Excel.

screen.JPG
 
Last edited:
#2
I'll have a look after, it should be a bit simpler than that no need for bs, Pandas should tidy it all up with read_html. In fact it's sacrilege going to excel from a DataFrame
 
#3
I'll have a look after, it should be a bit simpler than that no need for bs, Pandas should tidy it all up with read_html. In fact it's sacrilege going to excel from a DataFrame
The object was not whether you need bs or read_html in pandas,
it was to get on the RP naps page in the first place without getting a response 403 or access denied .