Python Forum
Thread Rating:
  • 1 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Web Crawler help
#9
Thanks a million!

I have added a new def that searches through the pages of the individual listings:

def get_single_item_data(item_url):
    source_code = requests.get(item_url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text, 'html.parser')
    for item in soup.findAll('li', {'class': 'breadcrumb-listitem'} ):
        area = item.find('a', )
        print(area)
I try to get one specific piece of information (neighborhood). I succeeded in getting the info I am looking for in the output, but with a lot of stuff that I don't want. (i only want the "title" (in the first example " Lombardijen")

the output for each listing is as follows (only copied the result for the first 3 listings). 

Output:
<a href="/koop/" title="Home">Home</a> <a href="/koop/rotterdam/" title="Rotterdam">Rotterdam</a> <a href="/koop/rotterdam/lombardijen/" title="[color=#333333]Lombardijen[/color]">Lombardijen</a> None <a href="/koop/" title="Home">Home</a> <a href="/koop/rotterdam/" title="Rotterdam">Rotterdam</a> <a href="/koop/rotterdam/s-gravenland/" title="'s-Gravenland">'s-Gravenland</a> None <a href="/koop/" title="Home">Home</a> <a href="/koop/rotterdam/" title="Rotterdam">Rotterdam</a> <a href="/koop/rotterdam/pendrecht/" title="Pendrecht">Pendrecht</a> None
The HTML code for each listing is looking as follows, I have made red what I tried to extract.

<div class="breadcrumb">
        <ol class="container breadcrumb-list">
                <li class="breadcrumb-listitem">
                        <a href="/koop/" title="Home">Home</a>

                        <span class="icon-arrow-right-grey"></span>
                </li>
                <li class="breadcrumb-listitem">
                        <a href="/koop/rotterdam/" title="Rotterdam">Rotterdam</a>

                        <span class="icon-arrow-right-grey"></span>
                </li>
                <li class="breadcrumb-listitem">
                        <a href="/koop/rotterdam/lombardijen/" title="Lombardijen">Lombardijen</a>

                        <span class="icon-arrow-right-grey"></span>
                </li>
                <li class="breadcrumb-listitem">
                        <span title="Scottstraat 3">Scottstraat 3</span>

                </li>
        </ol>

Hope this part of my puzzle can also be solved. Tnx again for the help
Reply


Messages In This Thread
Web Crawler help - by takaa - Feb-06-2017, 06:57 PM
RE: Web Crawler help - by wavic - Feb-06-2017, 08:53 PM
RE: Web Crawler help - by metulburr - Feb-06-2017, 08:57 PM
RE: Web Crawler help - by takaa - Feb-07-2017, 08:46 AM
RE: Web Crawler help - by wavic - Feb-07-2017, 09:46 AM
RE: Web Crawler help - by takaa - Feb-07-2017, 05:17 PM
RE: Web Crawler help - by snippsat - Feb-07-2017, 05:45 PM
RE: Web Crawler help - by metulburr - Feb-07-2017, 05:53 PM
RE: Web Crawler help - by takaa - Feb-07-2017, 10:12 PM
RE: Web Crawler help - by metulburr - Feb-08-2017, 02:33 AM
RE: Web Crawler help - by takaa - Feb-08-2017, 12:22 PM
RE: Web Crawler help - by takaa - Feb-08-2017, 01:31 PM
RE: Web Crawler help - by wavic - Feb-08-2017, 01:47 PM
RE: Web Crawler help - by snippsat - Feb-08-2017, 02:19 PM
RE: Web Crawler help - by takaa - Feb-09-2017, 11:16 AM
RE: Web Crawler help - by metulburr - Feb-09-2017, 12:07 PM
RE: Web Crawler help - by takaa - Feb-09-2017, 12:08 PM
RE: Web Crawler help - by Larz60+ - Feb-09-2017, 12:10 PM
RE: Web Crawler help - by metulburr - Feb-09-2017, 12:14 PM
RE: Web Crawler help - by takaa - Feb-10-2017, 12:24 PM
RE: Web Crawler help - by metulburr - Feb-10-2017, 01:06 PM
RE: Web Crawler help - by takaa - Feb-14-2017, 01:49 PM
RE: Web Crawler help - by metulburr - Feb-14-2017, 02:43 PM
RE: Web Crawler help - by takaa - Feb-14-2017, 02:54 PM
RE: Web Crawler help - by takaa - Feb-15-2017, 11:02 AM
RE: Web Crawler help - by metulburr - Feb-15-2017, 01:18 PM
RE: Web Crawler help - by takaa - Feb-15-2017, 01:46 PM
RE: Web Crawler help - by snippsat - Feb-15-2017, 03:48 PM
RE: Web Crawler help - by takaa - Feb-15-2017, 04:01 PM
RE: Web Crawler help - by metulburr - Feb-15-2017, 06:03 PM
RE: Web Crawler help - by takaa - Feb-20-2017, 03:10 PM
RE: Web Crawler help - by metulburr - Feb-20-2017, 05:52 PM
RE: Web Crawler help - by takaa - Feb-20-2017, 07:56 PM
RE: Web Crawler help - by metulburr - Feb-21-2017, 02:18 AM
RE: Web Crawler help - by takaa - Mar-04-2017, 07:42 PM
RE: Web Crawler help - by metulburr - Mar-05-2017, 01:12 AM
RE: Web Crawler help - by Stoss - Jan-28-2019, 12:39 PM
RE: Web Crawler help - by takaa - Jan-30-2019, 08:35 AM
RE: Web Crawler help - by metulburr - Jan-30-2019, 06:23 PM
RE: Web Crawler help - by stateitreal - Apr-26-2019, 12:14 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Web Crawler help Mr_Mafia 2 2,047 Apr-04-2020, 07:20 PM
Last Post: Mr_Mafia

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020