Python Forum

Full Version: Webscraping BeautifulSoup - Insecam
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hey,

I'm having troubles getting something to work from the insecam.org site. I am able to use the beautifulsoup examples. 
But when I try it with Insecam.org, I get 403 errors, and then tried something with agent-headers (which I just understand the concept of, but not how to use it) 
Still nothing works. 

Is there somebody that has some tips on how to do this. I want to take all the src='' '' in al the a='' '' from all the viewsin Japan

Any help would be appriciated, because at the moment I'm stuck...  Blush
>>> import requests
>>> url_get = requests.get('http://insecam.org/')
>>> url_get.status_code
403
>>> 
>>> user_agent = {'User-agent': 'Mozilla/5.0'}
>>> url_get = requests.get('http://insecam.org/', headers=user_agent)
>>> url_get.status_code
200
Ok no more 403 errors! Thanks.

Next what I'm trying to do is get this id=image0 which is the only returning object.
With this from this example kochi coders

 from bs4 import BeautifulSoup
import requests
 
user_agent = {'User-agent': 'Mozilla/5.0'}
url_get = requests.get('http://insecam.org/', headers=user_agent)

soup = BeautifulSoup(page.read())
nofollow = soup.find_all('a',id_='image0')
for all image0 in nofollow:
print(nofollow['src']+","+nofollow.string)
Errors when using the class = img-responsive img-rounded detailimage = 

Error:
 p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff} span.s1 {font-variant-ligatures: no-common-ligatures}  File "<stdin>", line 1     for all img-responsive img-rounded detailimage in nofollow:               ^ SyntaxError: invalid syntax
When using the id object as the example above I get following error. 

Error:
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff} span.s1 {font-variant-ligatures: no-common-ligatures} for all image0 in nofollow:   File "<stdin>", line 1     for all image0 in nofollow:                  ^ SyntaxError: invalid syntax
I find it hard to get comprehensive beginner examples for this, the documentation is a little overwhelming at the moment.
Look at BBcode help,i fixed it now.
Post code with correct indentation. 

You are making a basic error in the loop.
Can not be two values.
Eg:
nofollow = ['pic1', 'pic2']
for image in nofollow:
    print(image) 
Output:
pic1 pic2
It can be two values but the you most use enumerate()
nofollow = ['pic1', 'pic2']
for number,image in enumerate(nofollow, 1):
    print('{} --> {}'.format(number, image))
Output:
1 --> pic1 2 --> pic2
Not:
for all image0 in ...
Not:
for image0 in ...
(ie, no all)
Hello! There is no page Response object in your code. Iven if you was using the real one from the code - url_get - this object has no method read. Instead of read() ( which is represented in the urllib module ), you can get the page from the Response object with text or content: url_get.text, url_get.content