Web Crawler Not Working

wavic · Feb-03-2017, 08:43 AM

I run a copy of the code and got this:

Output:******* page 1 ********
******* page 2 ********
******* page 3 ********

***snippsat*** · Feb-03-2017, 09:21 AM

(Feb-03-2017, 08:43 AM)wavic Wrote: I run a copy of the code and got this:

The error is on you side,
to make sure i tested in Python 3.6 with virtual environment(new install of BeautifulSoup and Requests).
It do work,make sure your libraries are updated or run virtual environment(with new install).

Break it down,to see where the problem is.
First as eg,do you get source code.

import requests

url = 'http://theiconic.com.au/mens-clothing-tshirts-singlets/?page=1'
source_code = requests.get(url)
plain_text = source_code.text
print(plain_text[:90])

Output:<!DOCTYPE html>
<!--[if IE 7]>  <html xmlns:ng="http://angularjs.org" class="ie7" lang="en

wavic · Feb-03-2017, 09:48 AM

I get the page code and that is not the issue. I do not play for first time with bs4. I know how it works. And I don't think that virtual environment will make a difference. But eventually can try it later. No time now

scriptso · (This post was last modified: Feb-06-2017, 11:01 PM by scriptso.)

@snippsat nothing to it! Wonder if Original Poster gave up.

(Feb-03-2017, 09:48 AM)wavic Wrote: I get the page code and that is not the issue. I do not play for first time with bs4. I know how it works. And I don't think that virtual environment will make a difference. But eventually can try it later. No time now

Hey dude! when ever trouble shooting your should do a copy past of the code your talking about if theres any trouble shooting.... as well as you output error or not.

also, the only way making a virtual env would make a difference I can think of is if you download and install all sorts of modules that directly affect the modules that are being used so if that sounds like you the chance are pretty high...
Snaps got the right Idea... seems like hes troubleshooted once or twice =) yeah, break it down!!

My input... Also check what type of response your getting from the page, with snaps code and your output, your not getting any response...

Since scrapy is my go to tool, using scrapy shell, I always do a view(response) to see whats gong on between my request and the site.. 8/10 times when im not getting my items backs its a user_agent issue wich MIGHT be a thing... depending on your service provider.
Does the url change when you put it in a browser? (Multi-language pages arent commonly two sets of code but I have ran into a couple)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Web Crawler help	Mr_Mafia	2	1,898	Apr-04-2020, 07:20 PM Last Post: Mr_Mafia
	Web Crawler help	takaa	39	27,280	Apr-26-2019, 12:14 PM Last Post: stateitreal
	Python - Why multi threads are not working in this web crawler?	ratanbhushan	1	2,815	Nov-17-2017, 05:21 PM Last Post: Larz60+

Web Crawler Not Working

User Panel Messages

Announcements