Extracting the Address tag from multiple HTML files using BeautifulSoup

Dredd · (This post was last modified: Jan-24-2021, 11:26 PM by Dredd.)

(Jan-24-2021, 10:16 AM)buran Wrote: I didn't look at your code into depth. Now I see see it's a bit weird. You iterate over bunch of files, to read first title in separate list, then again to read address(es).

There is no need to use find_all for title - it is expected to have only one tag title, right? Just use soup.find()
Then, is it one address or multiple in each file?
You write to a file after you have exited the second loop. But using list-comprehension will give you only the data from last file not all files (i.e. like when you append to a single list) - this is something I overlooked.
Finally if write 2 lists, but I don't think it will give you what you expect anyway.
import csv
path = "C:\\Users\\mzoljan\\Downloads\\lksd\\"
 
for infile in glob.glob(os.path.join(path, "*.html")):
    with open(infile, "r") as f, open('output2.csv', 'а') as myfile:
        writer = csv.writer(myfile)
        soup = BeautifulSoup(f.read(), 'lxml')
        title = soup.find("title")
        if title:
           title = soup.title.string
        else:
            title = '' # just in case there is no title tag
        address = soup.find_all("address", class_={"styles_address__zrPvy"}) # do you really need find_all?
        for item in address:
            writer.writerow([title, item.string])
Note, the code is not tested as I don't have your html files.

Hi Buran, thanks for your help here. It is still not working as expected. To simplify it, I have 4x locally downloaded HTML files which I am trying to return the title and addressin string format. **CORRECTION** The method is returning 6x of the titles and only 1x address.

This is an example of one of the HTML files, which I have multiple: https://toddle.com.au/centres/tobeme-ear...-five-dock, I am basically just trying to scrap the name(title) and address of the school, on locally downloaded HTML files

Again you're assistance is greatly appreciated.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Getting a URL from Amazon using requests-html, or beautifulsoup	aaander	1	1,721	Nov-06-2022, 10:59 PM Last Post: snippsat
	Populating list items to html code and create individualized html code files	ChainyDaisy	0	1,621	Sep-21-2022, 07:18 PM Last Post: ChainyDaisy
	requests-html + Beautifulsoup	klaarnou	0	2,477	Mar-21-2022, 05:31 PM Last Post: klaarnou
	BeautifulSoup Showing none while extracting image url	josephandrew	0	1,972	Sep-20-2021, 11:40 AM Last Post: josephandrew
	HTML multi select HTML listbox with Flask/Python	rfeyer	0	4,737	Mar-14-2021, 12:23 PM Last Post: rfeyer
	Extracting html data using attributes	WiPi	14	5,642	May-04-2020, 02:04 PM Last Post: snippsat
	Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row	BrandonKastning	0	2,408	Mar-22-2020, 06:10 AM Last Post: BrandonKastning
	Web crawler extracting specific text from HTML	lewdow	1	3,451	Jan-03-2020, 11:21 PM Last Post: snippsat
	BeautifulSoup: Error while extracting a value from an HTML table	kawasso	3	3,306	Aug-25-2019, 01:13 AM Last Post: kawasso
	How do I extract specific lines from HTML files before and after a word?	glittergirl	1	5,163	Aug-06-2019, 07:23 AM Last Post: fishhook

Extracting the Address tag from multiple HTML files using BeautifulSoup

User Panel Messages

Announcements