Python Forum
Extracting the Address tag from multiple HTML files using BeautifulSoup
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extracting the Address tag from multiple HTML files using BeautifulSoup
#4
I didn't look at your code into depth. Now I see see it's a bit weird. You iterate over bunch of files, to read first title in separate list, then again to read address(es).

There is no need to use find_all for title - it is expected to have only one tag title, right? Just use soup.find()
Then, is it one address or multiple in each file?
You write to a file after you have exited the second loop. But using list-comprehension will give you only the data from last file not all files (i.e. like when you append to a single list) - this is something I overlooked.
Finally if write 2 lists, but I don't think it will give you what you expect anyway.
import csv
path = "C:\\Users\\mzoljan\\Downloads\\lksd\\"
 
for infile in glob.glob(os.path.join(path, "*.html")):
    with open(infile, "r") as f, open('output2.csv', 'а') as myfile:
        writer = csv.writer(myfile)
        soup = BeautifulSoup(f.read(), 'lxml')
        title = soup.find("title")
        if title:
           title = soup.title.string
        else:
            title = '' # just in case there is no title tag
        address = soup.find_all("address", class_={"styles_address__zrPvy"}) # do you really need find_all?
        for item in address:
            writer.writerow([title, item.string])
Note, the code is not tested as I don't have your html files.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Messages In This Thread
RE: Extracting the Address tag from multiple HTML files using BeautifulSoup - by buran - Jan-24-2021, 10:16 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Getting a URL from Amazon using requests-html, or beautifulsoup aaander 1 1,718 Nov-06-2022, 10:59 PM
Last Post: snippsat
  Populating list items to html code and create individualized html code files ChainyDaisy 0 1,620 Sep-21-2022, 07:18 PM
Last Post: ChainyDaisy
  requests-html + Beautifulsoup klaarnou 0 2,475 Mar-21-2022, 05:31 PM
Last Post: klaarnou
  BeautifulSoup Showing none while extracting image url josephandrew 0 1,969 Sep-20-2021, 11:40 AM
Last Post: josephandrew
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,735 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Extracting html data using attributes WiPi 14 5,642 May-04-2020, 02:04 PM
Last Post: snippsat
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,407 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Web crawler extracting specific text from HTML lewdow 1 3,451 Jan-03-2020, 11:21 PM
Last Post: snippsat
  BeautifulSoup: Error while extracting a value from an HTML table kawasso 3 3,301 Aug-25-2019, 01:13 AM
Last Post: kawasso
  How do I extract specific lines from HTML files before and after a word? glittergirl 1 5,161 Aug-06-2019, 07:23 AM
Last Post: fishhook

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020