hey!
my code is working very well, but still the output is not good enough for me.
here's my code:
but when it prints this out this is what i get:
thanks!
html = urlopen('http://community.gtarcade.com/forum/2036-1.html')
my code is working very well, but still the output is not good enough for me.
here's my code:
from bs4 import BeautifulSoup from urllib import urlopen import os os.chdir('C:/Users/yuvi/Desktop') html = urlopen('the url is in here. i will post it in a comment because i cant post clickable links on my first post') soup = BeautifulSoup(html) with open('LOA.txt', 'w') as f: for section in soup.findAll('a', {'class':'s xst'}): f.write('{}'.format(section) + '\n')this code should take all the posts from LOA forum first page, and print it out to a text file.
but when it prints this out this is what i get:
Output:<a class="s xst" href="thread/314240-1-1.html" onclick="atarget(this)" style="font-weight: bold;color: #EE1B2E;">
<em class="youzu_none">Event</em>[NEW]Preview of New Version on April 20th: New Amulet Is Added!</a>
<a class="s xst" href="thread/314233-1-1.html" onclick="atarget(this)" style="font-weight: bold;color: #EE1B2E;">
<em class="youzu_none">Event</em>[HOT]Cross-server Resource Tycoon: New Hero Celestial Blade Shows Up!</a>
how i can take off all the 'html code' and stay only with the title of the post? thanks!
html = urlopen('http://community.gtarcade.com/forum/2036-1.html')
from bs4 import BeautifulSoup from urllib import urlopen import os os.chdir('C:/Users/yuvi/Desktop') html = urlopen('http://community.gtarcade.com/forum/2036-1.html') soup = BeautifulSoup(html) with open('LOA.txt', 'w') as f: for section in soup.findAll('a', {'class':'s xst'}): f.write('{}'.format(section) + '\n')