Python Forum
web crawler that retrieves data not stored in source code
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
web crawler that retrieves data not stored in source code
#11
yeah, i checked the page source and it works just fine; i don't think there's gonna be much variation in there, unless they change the whole layout of the website but in that case it's not just gonna be 1 tag that doesn't match. :D

anyway, now i got the link part, the title part and the ad id part and all i need to do is convert the adid into views.

i'm using the code you guys gave me:

# adid is 18238521
# views is 4
href = 'http://www.publi24.ro/anunturi/imobiliare/de-vanzare/apartamente/garsoniera/anunt/Garsoniera-Sector-1/7b006674706c6156.html'

def get_adid(item_url):
    source_code = requests.get(item_url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text, 'html.parser')
    for link in soup.findAll('span', {'class':'fa fa-eye'}):
        adid = link.get('ng-init')
        num = adid.split('=')[1]
        print(num)

def views(item_url):
    browser = webdriver.PhantomJS(r'D:\phantomjs-2.1.1-windows\bin\phantomjs.exe')
    browser.get(item_url)
    time.sleep(0)
    soup = BeautifulSoup(browser.page_source, 'html.parser')
    tag = soup.find('span', {'add-view':get_adid(href)})
    print(tag.text)
    browser.quit()
but i'm not sure how to pass the adid from the get_adid() function into the dictionary from the 'tag' variable from the views() function. i tried putting it in there but it just prints the adid instead.
i thought about zipping 2 lists together but it doesn't print out a dictionary and both lists have to be the same length whereas here i'm trying to fit 'add-view' with whatever id i can get in there.
Reply


Messages In This Thread
RE: web crawler that retrieves data not stored in source code - by edithegodfather - Jan-07-2017, 08:11 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Hide source code from python process itself xmghe 2 1,884 Jan-27-2021, 04:04 PM
Last Post: xmghe
  Web Crawler help Mr_Mafia 2 1,900 Apr-04-2020, 07:20 PM
Last Post: Mr_Mafia
  scraping from a website that hides source code PIWI_Protein 1 1,973 Mar-27-2020, 05:08 PM
Last Post: Larz60+
  Web Crawler help takaa 39 27,286 Apr-26-2019, 12:14 PM
Last Post: stateitreal
  Python requests.get() returns broken source code instead of expected source code? FatalPythonError 3 3,734 Sep-21-2018, 02:46 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020