(Jan-24-2021, 10:16 AM)buran Wrote: I didn't look at your code into depth. Now I see see it's a bit weird. You iterate over bunch of files, to read first title in separate list, then again to read address(es).
There is no need to use find_all fortitle
- it is expected to have only one tag title, right? Just usesoup.find()
Then, is it one address or multiple in each file?
You write to a file after you have exited the second loop. But using list-comprehension will give you only the data from last file not all files (i.e. like when you append to a single list) - this is something I overlooked.
Finally if write 2 lists, but I don't think it will give you what you expect anyway.
import csv path = "C:\\Users\\mzoljan\\Downloads\\lksd\\" for infile in glob.glob(os.path.join(path, "*.html")): with open(infile, "r") as f, open('output2.csv', 'а') as myfile: writer = csv.writer(myfile) soup = BeautifulSoup(f.read(), 'lxml') title = soup.find("title") if title: title = soup.title.string else: title = '' # just in case there is no title tag address = soup.find_all("address", class_={"styles_address__zrPvy"}) # do you really need find_all? for item in address: writer.writerow([title, item.string])Note, the code is not tested as I don't have your html files.
Hi Buran, thanks for your help here. It is still not working as expected. To simplify it, I have 4x locally downloaded HTML files which I am trying to return the
title
and address
in string format. **CORRECTION** The method is returning 6x of the titles
and only 1x address
.This is an example of one of the HTML files, which I have multiple: https://toddle.com.au/centres/tobeme-ear...-five-dock, I am basically just trying to scrap the name(title) and address of the school, on locally downloaded HTML files
Again you're assistance is greatly appreciated.