Python Forum
web crawler that retrieves data not stored in source code
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
web crawler that retrieves data not stored in source code
#14
hey, thanks for the reply, sorry i couldn't get back to you sooner, i started work again and i don't have that much time on my hands anymore.
also sorry about the code tag thing, i kept noticing it was being formatted but i thought it was just doing itself; i'll be sure to put it in there on further replies. :)

i'm also getting some errors on this code right now and i'm trying to figure out how to make it work.

[edit]

ok, i figured it out; i made a new variable where i used the replace function

def crawler(max_pages):
    page = 1
   while page <= max_pages:
       url = 'http://www.publi24.ro/anunturi/imobiliare/bucuresti/?pag=' + str(page)
       source_code = requests.get(url)
       plain_text = source_code.text
       soup = BeautifulSoup(plain_text, 'html.parser')
       for link in soup.findAll('a', {'itemprop':'name'}):
           href = 'http://www.publi24.ro' + link.get('href')
           href2 = href.replace('http://www.publi24.rohttp://www.publi24.ro','http://www.publi24.ro')
           # ad_title(href)
           # views(href)
           print(href2)
       page += 1
Reply


Messages In This Thread
RE: web crawler that retrieves data not stored in source code - by edithegodfather - Jan-11-2017, 01:28 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Hide source code from python process itself xmghe 2 1,884 Jan-27-2021, 04:04 PM
Last Post: xmghe
  Web Crawler help Mr_Mafia 2 1,898 Apr-04-2020, 07:20 PM
Last Post: Mr_Mafia
  scraping from a website that hides source code PIWI_Protein 1 1,972 Mar-27-2020, 05:08 PM
Last Post: Larz60+
  Web Crawler help takaa 39 27,282 Apr-26-2019, 12:14 PM
Last Post: stateitreal
  Python requests.get() returns broken source code instead of expected source code? FatalPythonError 3 3,730 Sep-21-2018, 02:46 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020