Python Forum
Thread Rating:
  • 1 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Web Crawler Not Working
#11
I run a copy of the code and got this:

Output:
******* page 1 ******** ******* page 2 ******** ******* page 3 ********
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#12
(Feb-03-2017, 08:43 AM)wavic Wrote: I run a copy of the code and got this:
The error is on you side,
to make sure i tested in Python 3.6 with virtual environment(new install of BeautifulSoup and Requests).
It do work,make sure your libraries are updated or run virtual environment(with new install).

Break it down,to see where the problem is.
First as eg,do you get source code.
import requests

url = 'http://theiconic.com.au/mens-clothing-tshirts-singlets/?page=1'
source_code = requests.get(url)
plain_text = source_code.text
print(plain_text[:90])
Output:
<!DOCTYPE html> <!--[if IE 7]>  <html xmlns:ng="http://angularjs.org" class="ie7" lang="en
Reply
#13
I get the page code and that is not the issue. I do not play for first time with bs4. I know how it works. And  I don't think that virtual environment will make a difference. But eventually can try it later. No time now

Attached Files

.txt   soup.txt (Size: 4.47 KB / Downloads: 509)
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#14
@snippsat nothing to it! Wonder if Original Poster gave up.

(Feb-03-2017, 09:48 AM)wavic Wrote: I get the page code and that is not the issue. I do not play for first time with bs4. I know how it works. And  I don't think that virtual environment will make a difference. But eventually can try it later. No time now

 Hey dude!  when ever trouble shooting your should do a copy past of the code your talking about if theres any trouble shooting.... as well as you output error or not.

also, the only way making a virtual env would make a difference I can think of is if you download and install all sorts of modules that directly affect the modules that are being used so if that sounds like you the chance are pretty high...
 Snaps got the right Idea... seems like hes troubleshooted once or twice =) yeah, break it down!!

My input... Also check what type of response your getting from the page, with snaps code and your output, your not getting any response...   

Since scrapy is my go to tool, using scrapy shell, I always do a view(response) to see whats gong on between my request and the site.. 8/10 times when im not getting my items backs its a user_agent issue wich MIGHT be a thing... depending on your service provider.
Does the url change when you put it in a browser? (Multi-language pages arent commonly two sets of code but I have ran into a couple)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Web Crawler help Mr_Mafia 2 1,837 Apr-04-2020, 07:20 PM
Last Post: Mr_Mafia
  Web Crawler help takaa 39 26,801 Apr-26-2019, 12:14 PM
Last Post: stateitreal
  Python - Why multi threads are not working in this web crawler? ratanbhushan 1 2,756 Nov-17-2017, 05:21 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020