Beautiful Soup - Title + Paragraph into a text file - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Beautiful Soup - Title + Paragraph into a text file (/thread-11539.html) |
Beautiful Soup - Title + Paragraph into a text file - dj99 - Jul-14-2018 Hi all, I am trying to extract a heading and a title, there is something not quite right about this from bs4 import BeautifulSoup html = '''\ <h2 class="Title">section1</h2> <p class ="mainparagraph">article1</p> <p>article2</p> <p>article3</p> <h2>section2</h2> <span class="1"> hello 1 </span> <p>article4</p> <p>article5</p> <h2 class="2"> hello </h2> <p>article6</p> <span class="2"> hello 2 </span> <h1> Lorem Ipsum</h1> <p> 1 Lorem ipsum dolor </p> <h2> Lorem Ipsum</h1> <p> 2 Lorem ipsum dolor </p> <h1> Lorem Ipsum</h1> <p> 3 Lorem ipsum dolor </p>",'lxml') ''' soup = BeautifulSoup(html, 'lxml') #soup = BeautifulSoup(open("a.html"),'lxml') links = soup.findAll('h2', {'class': ['Title']},limit=1) with open('New.txt','w') as Output_File: for link in links: names1 = link.contents[0] links = soup.find('p', {'class': ['mainparagraph']}) names2 = link.contents[0] names2.extract() Output_File.write(print,names1.extract()+ '\n', names2.extract()) I am not sure if I am meant to append the results ? thank you RE: Beautiful Soup - Title + Paragraph into a text file - snippsat - Jul-14-2018 First how to find and select. >>> h2 = soup.find('h2', class_="2") >>> h2 <h2 class="2"> hello </h2> >>> h2.text ' hello ' >>> p = soup.find('p', class_="mainparagraph") >>> p <p class="mainparagraph">article1</p> >>> p.text 'article1' >>> # Using Css selector >>> h2 = soup.select('.2') >>> h2 [<h2 class="2"> hello </h2>, <span class="2"> hello 2 </span>] >>> h2_2 = soup.select('h2.2') >>> h2_2 [<h2 class="2"> hello </h2>] >>> h2_2[0].text ' hello ' >>> p = soup.select('p.mainparagraph') >>> p [<p class="mainparagraph">article1</p>] >>> p[0].text 'article1'You doing some stage stuff in write like using print. with open('h2_headings.txt', 'w') as f_out: for tag in soup.select('h2'): f_out.write('{}\n'.format(tag.text.strip()))
Output_File findAll look at PEP-8.So this would be output_file find_all (bs4 keep the old way findAll for backward compatibility).
RE: Beautiful Soup - Title + Paragraph into a text file - dj99 - Jul-14-2018 Hello S, thank you for those pointers. I am trying to put these 2 lines into a text file <h2 class="Title">section1</h2> <p class ="mainparagraph">article1</p> I managed to only get the h2 extracted Next I wanted to get the paragraph class mainparagraph I dont know how to append the paragraph into my text file I can create 2 loops - opening it once then again - but that seemed a bit redundant. I'm not sure how to put it together the final result let me see.. RE: Beautiful Soup - Title + Paragraph into a text file - snippsat - Jul-14-2018 (Jul-14-2018, 12:42 PM)dj99 Wrote: I am trying to put these 2 lines into a text file >>> tags = soup.select('.Title, .mainparagraph') >>> tags [<h2 class="Title">section1</h2>, <p class="mainparagraph">article1</p>] >>> for tag in tags: ... print(tag.text) ... section1 article1Then look how i written to text file in post before. RE: Beautiful Soup - Title + Paragraph into a text file - dj99 - Jul-14-2018 Thank you S, I think i overcomplicated it with my initial looping This is a lot more simpler Have a great weekend! :) |