Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Beautiful soup and tags
#11
(Jul-08-2019, 12:16 PM)starter_student Wrote: and now there is no error but the output file is empty just with headers
That's because your parsing or something else is wrong.
Do test is small step,put in print() and do test in REPL.
store_details = {} should be outside of the loop.

The html code you posted it's just a mess.
To show how can test html code outside of a web-site.
from bs4 import BeautifulSoup as soup
import csv
import requests

html = '''\
<div id="storelist">
  <ul>
    <li>Coffee</li>
    <li>Tea</li>
    <li>Milk</li>
  </ul>
</div>'''

#code
from bs4 import BeautifulSoup as soup
import csv
import requests

#URL = "http:www.abc.com"
#r = requests.get(URL)
soup = BeautifulSoup(html, 'lxml')
table = soup.find('div', id="storelist")
print(table) # Test print
store_details = {}
for row in table.find_all('li'):
    store_details[row.text] = f'<{row.text}> parsed for site'

filename = 'store_details_tab.csv'
with open(filename, 'w') as f:
    w = csv.DictWriter(f, ['Coffee', 'Tea', 'Milk'])
    w.writeheader()
    w.writerow(store_details)
In csv:
Output:
Coffee,Tea,Milk <Coffee> parsed for site,<Tea> parsed for site,<Milk> parsed for site
Reply
#12
(Jul-08-2019, 02:15 PM)snippsat Wrote:
(Jul-08-2019, 12:16 PM)starter_student Wrote: and now there is no error but the output file is empty just with headers
That's because your parsing or something else is wrong.
Do test is small step,put in print() and do test in REPL.
store_details = {} should be outside of the loop.

The html code you posted it's just a mess.
To show how can test html code outside of a web-site.
from bs4 import BeautifulSoup as soup
import csv
import requests

html = '''\
<div id="storelist">
  <ul>
    <li>Coffee</li>
    <li>Tea</li>
    <li>Milk</li>
  </ul>
</div>'''

#code
from bs4 import BeautifulSoup as soup
import csv
import requests

#URL = "http:www.abc.com"
#r = requests.get(URL)
soup = BeautifulSoup(html, 'lxml')
table = soup.find('div', id="storelist")
print(table) # Test print
store_details = {}
for row in table.find_all('li'):
    store_details[row.text] = f'<{row.text}> parsed for site'

filename = 'store_details_tab.csv'
with open(filename, 'w') as f:
    w = csv.DictWriter(f, ['Coffee', 'Tea', 'Milk'])
    w.writeheader()
    w.writerow(store_details)
In csv:
Output:
Coffee,Tea,Milk <Coffee> parsed for site,<Tea> parsed for site,<Milk> parsed for site

Thanks for this approach ... it helped me to understand some stuffs. The html code was just a sample ... here is the right structure with a nested div

[html]
<div id ="storelist" class>
<ul>
<li id ="00021455" class>
<div class ="wr-store-details">
<p> name </p>
<span class ="address dc">Street 2</span>
<span class ="city">LA</span>
</div>
</li>
<li>
</li>
.
.
.
</ul>
</div>
[/html]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Beautiful Soup - access a rating value in a class KatMac 1 3,447 Apr-16-2021, 01:27 PM
Last Post: snippsat
  *Beginner* web scraping/Beautiful Soup help 7ken8 2 2,596 Jan-28-2021, 04:26 PM
Last Post: 7ken8
  Help: Beautiful Soup - Parsing HTML table ironfelix717 2 2,661 Oct-01-2020, 02:19 PM
Last Post: snippsat
  Beautiful Soup (suddenly) doesn't get full webpage html j.crater 8 16,762 Jul-11-2020, 04:31 PM
Last Post: j.crater
  Requests-HTML vs Beautiful Soup - How to Choose? robin73 0 3,809 Jun-23-2020, 02:53 PM
Last Post: robin73
  looking for direction - scrappy, crawler, beautiful soup Sly_Corn 2 2,439 Mar-17-2020, 03:17 PM
Last Post: Sly_Corn
  Beautiful soup truncates results jonesjoz 4 3,854 Mar-09-2020, 06:04 PM
Last Post: jonesjoz
  Beautiful Soup find_all() kirito85 2 3,349 Jun-14-2019, 02:17 AM
Last Post: kirito85
  [split] Using beautiful soup to get html attribute value moski 6 6,272 Jun-03-2019, 04:24 PM
Last Post: moski
  Using beautiful soup to get html attribute value graham23s 2 18,069 Apr-23-2019, 09:21 PM
Last Post: graham23s

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020