Beautiful Soup - Title + Paragraph into a text file

dj99 · Jul-14-2018, 10:52 AM

Hi all,

I am trying to extract a heading and a title, there is something not quite right about this

from bs4 import BeautifulSoup
 
html = '''\
<h2 class="Title">section1</h2>
<p class ="mainparagraph">article1</p>
<p>article2</p>
<p>article3</p>
<h2>section2</h2>
<span class="1"> hello 1 </span>
<p>article4</p>
<p>article5</p>
<h2 class="2"> hello </h2>
<p>article6</p>
<span class="2"> hello 2 </span>
<h1> Lorem Ipsum</h1>
<p> 1 Lorem ipsum dolor </p>
<h2> Lorem Ipsum</h1>
<p> 2 Lorem ipsum dolor </p>
<h1> Lorem Ipsum</h1>
<p> 3 Lorem ipsum dolor </p>",'lxml')
'''

soup = BeautifulSoup(html, 'lxml') 

#soup = BeautifulSoup(open("a.html"),'lxml')


links = soup.findAll('h2', {'class': ['Title']},limit=1)       


with open('New.txt','w') as Output_File:
    for link in links:
        names1 = link.contents[0]
        links = soup.find('p', {'class': ['mainparagraph']})
        names2 = link.contents[0]
        names2.extract()

        
        Output_File.write(print,names1.extract()+ '\n', names2.extract())

I am not sure if I am meant to append the results ?

thank you

***snippsat*** · (This post was last modified: Jul-14-2018, 11:52 AM by snippsat.)

First how to find and select.

>>> h2 = soup.find('h2', class_="2")
>>> h2
<h2 class="2"> hello </h2>
>>> h2.text
' hello '

>>> p = soup.find('p', class_="mainparagraph")
>>> p
<p class="mainparagraph">article1</p>
>>> p.text
'article1'

>>> # Using Css selector
>>> h2 = soup.select('.2')
>>> h2
[<h2 class="2"> hello </h2>, <span class="2"> hello 2 </span>]

>>> h2_2 = soup.select('h2.2')
>>> h2_2
[<h2 class="2"> hello </h2>]
>>> h2_2[0].text
' hello '

>>> p = soup.select('p.mainparagraph')
>>> p
[<p class="mainparagraph">article1</p>]
>>> p[0].text
'article1'

You doing some stage stuff in write like using print.

with open('h2_headings.txt', 'w') as f_out:
    for tag in soup.select('h2'):
        f_out.write('{}\n'.format(tag.text.strip()))

Output:section1
section2
hello
Lorem Ipsum

Output_File findAll look at PEP-8.
So this would be output_file find_all(bs4 keep the old way findAll for backward compatibility).

dj99 · Jul-14-2018, 12:42 PM

Hello S,

thank you for those pointers.

I am trying to put these 2 lines into a text file

<h2 class="Title">section1</h2>
<p class ="mainparagraph">article1</p>

I managed to only get the h2 extracted
Next I wanted to get the paragraph class mainparagraph
I dont know how to append the paragraph into my text file

I can create 2 loops - opening it once then again - but that seemed a bit redundant.
I'm not sure how to put it together the final result
let me see..

***snippsat*** · (This post was last modified: Jul-14-2018, 01:04 PM by snippsat.)

(Jul-14-2018, 12:42 PM)dj99 Wrote: I am trying to put these 2 lines into a text file

<h2 class="Title">section1</h2>
<p class ="mainparagraph">article1</p>

>>> tags = soup.select('.Title, .mainparagraph')
>>> tags
[<h2 class="Title">section1</h2>, <p class="mainparagraph">article1</p>]

>>> for tag in tags:
...     print(tag.text)
...     
section1
article1

Then look how i written to text file in post before.

dj99 · Jul-14-2018, 01:37 PM

Thank you S,

I think i overcomplicated it with my initial looping

This is a lot more simpler

Have a great weekend!

:)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	<title> django page title dynamic and other field (not working)	lemonred	1	2,114	Nov-04-2021, 08:50 PM Last Post: lemonred
	Beautiful Soup - access a rating value in a class	KatMac	1	3,477	Apr-16-2021, 01:27 PM Last Post: snippsat
	Beginner web scraping/Beautiful Soup help	7ken8	2	2,622	Jan-28-2021, 04:26 PM Last Post: 7ken8
	Help: Beautiful Soup - Parsing HTML table	ironfelix717	2	2,700	Oct-01-2020, 02:19 PM Last Post: snippsat
	Beautiful Soup (suddenly) doesn't get full webpage html	j.crater	8	16,964	Jul-11-2020, 04:31 PM Last Post: j.crater
	Requests-HTML vs Beautiful Soup - How to Choose?	robin73	0	3,830	Jun-23-2020, 02:53 PM Last Post: robin73
	looking for direction - scrappy, crawler, beautiful soup	Sly_Corn	2	2,463	Mar-17-2020, 03:17 PM Last Post: Sly_Corn
	Beautiful soup truncates results	jonesjoz	4	3,889	Mar-09-2020, 06:04 PM Last Post: jonesjoz
	Getting a specific text inside an html with soup	mathieugrimbert	9	15,962	Jul-10-2019, 12:40 PM Last Post: mathieugrimbert
	Beautiful soup and tags	starter_student	11	6,198	Jul-08-2019, 03:41 PM Last Post: starter_student

Beautiful Soup - Title + Paragraph into a text file

User Panel Messages

Announcements