Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Beautiful soup and tags
#11
(Jul-08-2019, 12:16 PM)starter_student Wrote: and now there is no error but the output file is empty just with headers
That's because your parsing or something else is wrong.
Do test is small step,put in print() and do test in REPL.
store_details = {} should be outside of the loop.

The html code you posted it's just a mess.
To show how can test html code outside of a web-site.
from bs4 import BeautifulSoup as soup
import csv
import requests

html = '''\
<div id="storelist">
  <ul>
    <li>Coffee</li>
    <li>Tea</li>
    <li>Milk</li>
  </ul>
</div>'''

#code
from bs4 import BeautifulSoup as soup
import csv
import requests

#URL = "http:www.abc.com"
#r = requests.get(URL)
soup = BeautifulSoup(html, 'lxml')
table = soup.find('div', id="storelist")
print(table) # Test print
store_details = {}
for row in table.find_all('li'):
    store_details[row.text] = f'<{row.text}> parsed for site'

filename = 'store_details_tab.csv'
with open(filename, 'w') as f:
    w = csv.DictWriter(f, ['Coffee', 'Tea', 'Milk'])
    w.writeheader()
    w.writerow(store_details)
In csv:
Output:
Coffee,Tea,Milk <Coffee> parsed for site,<Tea> parsed for site,<Milk> parsed for site
Reply
#12
(Jul-08-2019, 02:15 PM)snippsat Wrote:
(Jul-08-2019, 12:16 PM)starter_student Wrote: and now there is no error but the output file is empty just with headers
That's because your parsing or something else is wrong.
Do test is small step,put in print() and do test in REPL.
store_details = {} should be outside of the loop.

The html code you posted it's just a mess.
To show how can test html code outside of a web-site.
from bs4 import BeautifulSoup as soup
import csv
import requests

html = '''\
<div id="storelist">
  <ul>
    <li>Coffee</li>
    <li>Tea</li>
    <li>Milk</li>
  </ul>
</div>'''

#code
from bs4 import BeautifulSoup as soup
import csv
import requests

#URL = "http:www.abc.com"
#r = requests.get(URL)
soup = BeautifulSoup(html, 'lxml')
table = soup.find('div', id="storelist")
print(table) # Test print
store_details = {}
for row in table.find_all('li'):
    store_details[row.text] = f'<{row.text}> parsed for site'

filename = 'store_details_tab.csv'
with open(filename, 'w') as f:
    w = csv.DictWriter(f, ['Coffee', 'Tea', 'Milk'])
    w.writeheader()
    w.writerow(store_details)
In csv:
Output:
Coffee,Tea,Milk <Coffee> parsed for site,<Tea> parsed for site,<Milk> parsed for site

Thanks for this approach ... it helped me to understand some stuffs. The html code was just a sample ... here is the right structure with a nested div

[html]
<div id ="storelist" class>
<ul>
<li id ="00021455" class>
<div class ="wr-store-details">
<p> name </p>
<span class ="address dc">Street 2</span>
<span class ="city">LA</span>
</div>
</li>
<li>
</li>
.
.
.
</ul>
</div>
[/html]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Beautiful Soup - access a rating value in a class KatMac 1 3,463 Apr-16-2021, 01:27 PM
Last Post: snippsat
  *Beginner* web scraping/Beautiful Soup help 7ken8 2 2,606 Jan-28-2021, 04:26 PM
Last Post: 7ken8
  Help: Beautiful Soup - Parsing HTML table ironfelix717 2 2,675 Oct-01-2020, 02:19 PM
Last Post: snippsat
  Beautiful Soup (suddenly) doesn't get full webpage html j.crater 8 16,820 Jul-11-2020, 04:31 PM
Last Post: j.crater
  Requests-HTML vs Beautiful Soup - How to Choose? robin73 0 3,820 Jun-23-2020, 02:53 PM
Last Post: robin73
  looking for direction - scrappy, crawler, beautiful soup Sly_Corn 2 2,448 Mar-17-2020, 03:17 PM
Last Post: Sly_Corn
  Beautiful soup truncates results jonesjoz 4 3,870 Mar-09-2020, 06:04 PM
Last Post: jonesjoz
  Beautiful Soup find_all() kirito85 2 3,361 Jun-14-2019, 02:17 AM
Last Post: kirito85
  [split] Using beautiful soup to get html attribute value moski 6 6,287 Jun-03-2019, 04:24 PM
Last Post: moski
  Using beautiful soup to get html attribute value graham23s 2 18,080 Apr-23-2019, 09:21 PM
Last Post: graham23s

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020