Need help with BeautifulSoup

mor2k · (This post was last modified: Apr-20-2017, 05:34 PM by snippsat.)

hey!
my code is working very well, but still the output is not good enough for me.
here's my code:

        
              from bs4 import BeautifulSoup
from urllib import urlopen
import os
 
os.chdir('C:/Users/yuvi/Desktop')
 
html = urlopen('the url is in here.  i will post it in a comment because i cant post clickable links on my first post')
soup = BeautifulSoup(html)
 
with open('LOA.txt', 'w') as f:
   for section in soup.findAll('a', {'class':'s xst'}):
       f.write('{}'.format(section) + '\n')

this code should take all the posts from LOA forum first page, and print it out to a text file.
but when it prints this out this is what i get:

Output:<a class="s xst" href="thread/314240-1-1.html" onclick="atarget(this)" style="font-weight: bold;color: #EE1B2E;">
<em class="youzu_none">Event</em>[NEW]Preview of New Version on April 20th: New Amulet Is Added!</a>
<a class="s xst" href="thread/314233-1-1.html" onclick="atarget(this)" style="font-weight: bold;color: #EE1B2E;">
<em class="youzu_none">Event</em>[HOT]Cross-server Resource Tycoon: New Hero Celestial Blade Shows Up!</a>

how i can take off all the 'html code' and stay only with the title of the post?
thanks!

html = urlopen('http://community.gtarcade.com/forum/2036-1.html')

1

2

3

4

5

6

7

8

9

10

11

12

from bs4 import BeautifulSoup
from urllib import urlopen
import os
 
os.chdir('C:/Users/yuvi/Desktop')
 
html = urlopen('http://community.gtarcade.com/forum/2036-1.html')
soup = BeautifulSoup(html)
 
with open('LOA.txt', 'w') as f:
   for section in soup.findAll('a', {'class':'s xst'}):
       f.write('{}'.format(section) + '\n')

***snippsat*** · (This post was last modified: Apr-20-2017, 05:52 PM by snippsat.)

(Apr-20-2017, 05:27 PM)mor2k Wrote: how i can take off all the 'html code' and stay only with the title of the post?

        
              from bs4 import BeautifulSoup
 
html ='''\
<a class="s xst" href="thread/314240-1-1.html" onclick="atarget(this)" style="font-weight: bold;color: #EE1B2E;">
<em class="youzu_none">Event</em>[NEW]Preview of New Version on April 20th: New Amulet Is Added!</a>
<a class="s xst" href="thread/314233-1-1.html" onclick="atarget(this)" style="font-weight: bold;color: #EE1B2E;">
<em class="youzu_none">Event</em>[HOT]Cross-server Resource Tycoon: New Hero Celestial Blade Shows Up!</a>'''
 
soup = BeautifulSoup(html, 'html.parser')
post = soup.find_all('a')
for title in post:
   print(title.text.strip())

Output:Event[NEW]Preview of New Version on April 20th: New Amulet Is Added!
Event[HOT]Cross-server Resource Tycoon: New Hero Celestial Blade Shows Up!

Use Requests when reading a site,and not urllib.
Eg:

1

2

3

4

5

6

7

import requests
from bs4 import BeautifulSoup
 
url = 'http://community.gtarcade.com/forum/2036-1.html'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
print(soup.find('title').text)

Output:
News and Announcements-League of Angels Forum

-

Need help with BeautifulSoup

User Panel Messages

Announcements