Python Forum
web crawler that retrieves data not stored in source code
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
web crawler that retrieves data not stored in source code
#13
Check if link.get('href') has http.
If it has do nothing,else add http link.
Eg:
from bs4 import BeautifulSoup

html = '''\
<a href="http://www.publi24.ro/anunturi/"</a>
<a href="/anunturi/imobiliare/de-vanzare/"</a>'''

soup = BeautifulSoup(html, 'html.parser')
for link in soup.find_all('a'):
    if 'http' in link.get('href'):
        print(link.get('href'))
    else:
        print('http://www.publi24.ro{}'.format(link.get('href')))
Output:
http://www.publi24.ro/anunturi/ http://www.publi24.ro/anunturi/imobiliare/de-vanzare/
Use code tag in your post,i have added it for you now.
Reply


Messages In This Thread
RE: web crawler that retrieves data not stored in source code - by snippsat - Jan-10-2017, 06:52 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Hide source code from python process itself xmghe 2 1,884 Jan-27-2021, 04:04 PM
Last Post: xmghe
  Web Crawler help Mr_Mafia 2 1,898 Apr-04-2020, 07:20 PM
Last Post: Mr_Mafia
  scraping from a website that hides source code PIWI_Protein 1 1,972 Mar-27-2020, 05:08 PM
Last Post: Larz60+
  Web Crawler help takaa 39 27,282 Apr-26-2019, 12:14 PM
Last Post: stateitreal
  Python requests.get() returns broken source code instead of expected source code? FatalPythonError 3 3,730 Sep-21-2018, 02:46 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020