web crawler that retrieves data not stored in source code

edithegodfather · (This post was last modified: Jan-11-2017, 01:28 PM by edithegodfather.)

hey, thanks for the reply, sorry i couldn't get back to you sooner, i started work again and i don't have that much time on my hands anymore.
also sorry about the code tag thing, i kept noticing it was being formatted but i thought it was just doing itself; i'll be sure to put it in there on further replies. :)

i'm also getting some errors on this code right now and i'm trying to figure out how to make it work.

[edit]

ok, i figured it out; i made a new variable where i used the replace function

def crawler(max_pages):
    page = 1
   while page <= max_pages:
       url = 'http://www.publi24.ro/anunturi/imobiliare/bucuresti/?pag=' + str(page)
       source_code = requests.get(url)
       plain_text = source_code.text
       soup = BeautifulSoup(plain_text, 'html.parser')
       for link in soup.findAll('a', {'itemprop':'name'}):
           href = 'http://www.publi24.ro' + link.get('href')
           href2 = href.replace('http://www.publi24.rohttp://www.publi24.ro','http://www.publi24.ro')
           # ad_title(href)
           # views(href)
           print(href2)
       page += 1

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Hide source code from python process itself	xmghe	2	1,884	Jan-27-2021, 04:04 PM Last Post: xmghe
	Web Crawler help	Mr_Mafia	2	1,898	Apr-04-2020, 07:20 PM Last Post: Mr_Mafia
	scraping from a website that hides source code	PIWI_Protein	1	1,972	Mar-27-2020, 05:08 PM Last Post: Larz60+
	Web Crawler help	takaa	39	27,282	Apr-26-2019, 12:14 PM Last Post: stateitreal
	Python requests.get() returns broken source code instead of expected source code?	FatalPythonError	3	3,730	Sep-21-2018, 02:46 PM Last Post: nilamo

web crawler that retrieves data not stored in source code

User Panel Messages

Announcements