web crawler that retrieves data not stored in source code

Thread Rating:

0 Vote(s) - 0 Average
1
2
3
4
5

Thread Modes

web crawler that retrieves data not stored in source code

snippsat

Administrators

Posts: 7,091

Threads: 122

Joined: Sep 2016

Reputation: 499

#13

Jan-10-2017, 06:52 AM (This post was last modified: Jan-10-2017, 06:52 AM by snippsat.)

Check if link.get('href') has http.
If it has do nothing,else add http link.
Eg:

from bs4 import BeautifulSoup

html = '''\
<a href="http://www.publi24.ro/anunturi/"</a>
<a href="/anunturi/imobiliare/de-vanzare/"</a>'''

soup = BeautifulSoup(html, 'html.parser')
for link in soup.find_all('a'):
    if 'http' in link.get('href'):
        print(link.get('href'))
    else:
        print('http://www.publi24.ro{}'.format(link.get('href')))

Output:http://www.publi24.ro/anunturi/
http://www.publi24.ro/anunturi/imobiliare/de-vanzare/

Use code tag in your post,i have added it for you now.

Find

Messages In This Thread

web crawler that retrieves data not stored in source code - by edithegodfather - Jan-05-2017, 12:09 AM

RE: web crawler that retrieves data not stored in source code - by metulburr - Jan-05-2017, 12:56 AM

RE: web crawler that retrieves data not stored in source code - by edithegodfather - Jan-05-2017, 01:24 AM

RE: web crawler that retrieves data not stored in source code - by metulburr - Jan-05-2017, 02:53 AM

RE: web crawler that retrieves data not stored in source code - by edithegodfather - Jan-05-2017, 08:40 PM

RE: web crawler that retrieves data not stored in source code - by snippsat - Jan-05-2017, 08:51 PM

RE: web crawler that retrieves data not stored in source code - by metulburr - Jan-05-2017, 09:27 PM

RE: web crawler that retrieves data not stored in source code - by edithegodfather - Jan-06-2017, 03:07 PM

RE: web crawler that retrieves data not stored in source code - by edithegodfather - Jan-07-2017, 12:44 AM

RE: web crawler that retrieves data not stored in source code - by metulburr - Jan-07-2017, 04:28 AM

RE: web crawler that retrieves data not stored in source code - by edithegodfather - Jan-07-2017, 08:11 PM

RE: web crawler that retrieves data not stored in source code - by edithegodfather - Jan-10-2017, 05:58 AM

RE: web crawler that retrieves data not stored in source code - by snippsat - Jan-10-2017, 06:52 AM

RE: web crawler that retrieves data not stored in source code - by edithegodfather - Jan-11-2017, 01:28 PM

RE: web crawler that retrieves data not stored in source code - by edithegodfather - Jan-14-2017, 01:01 AM

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Hide source code from python process itself	xmghe	2	1,884	Jan-27-2021, 04:04 PM Last Post: xmghe
	Web Crawler help	Mr_Mafia	2	1,898	Apr-04-2020, 07:20 PM Last Post: Mr_Mafia
	scraping from a website that hides source code	PIWI_Protein	1	1,972	Mar-27-2020, 05:08 PM Last Post: Larz60+
	Web Crawler help	takaa	39	27,282	Apr-26-2019, 12:14 PM Last Post: stateitreal
	Python requests.get() returns broken source code instead of expected source code?	FatalPythonError	3	3,730	Sep-21-2018, 02:46 PM Last Post: nilamo

Users browsing this thread: 2 Guest(s)

View a Printable Version

web crawler that retrieves data not stored in source code

User Panel Messages

Announcements