hey!
my code is working very well, but still the output is not good enough for me.
here's my code:
this code should take all the posts from LOA forum first page, and print it out to a text file.
but when it prints this out this is what i get:
thanks!
html = urlopen('http://community.gtarcade.com/forum/2036-1.html')
my code is working very well, but still the output is not good enough for me.
here's my code:
1 2 3 4 5 6 7 8 9 10 11 12 |
from bs4 import BeautifulSoup from urllib import urlopen import os os.chdir( 'C:/Users/yuvi/Desktop' ) html = urlopen( 'the url is in here. i will post it in a comment because i cant post clickable links on my first post' ) soup = BeautifulSoup(html) with open ( 'LOA.txt' , 'w' ) as f: for section in soup.findAll( 'a' , { 'class' : 's xst' }): f.write( '{}' . format (section) + '\n' ) |
but when it prints this out this is what i get:
Output:<a class="s xst" href="thread/314240-1-1.html" onclick="atarget(this)" style="font-weight: bold;color: #EE1B2E;">
<em class="youzu_none">Event</em>[NEW]Preview of New Version on April 20th: New Amulet Is Added!</a>
<a class="s xst" href="thread/314233-1-1.html" onclick="atarget(this)" style="font-weight: bold;color: #EE1B2E;">
<em class="youzu_none">Event</em>[HOT]Cross-server Resource Tycoon: New Hero Celestial Blade Shows Up!</a>
how i can take off all the 'html code' and stay only with the title of the post? thanks!
html = urlopen('http://community.gtarcade.com/forum/2036-1.html')
1 2 3 4 5 6 7 8 9 10 11 12 |
from bs4 import BeautifulSoup from urllib import urlopen import os os.chdir( 'C:/Users/yuvi/Desktop' ) soup = BeautifulSoup(html) with open ( 'LOA.txt' , 'w' ) as f: for section in soup.findAll( 'a' , { 'class' : 's xst' }): f.write( '{}' . format (section) + '\n' ) |