Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to get a new line
#1
After scrapping the website everything works like expected here is the code
#!/usr/bin/python

import urllib2
from bs4 import BeautifulSoup

website = ("http://www.ahajokes.com/ym01.html")
page = urllib2.urlopen(website)

soup = BeautifulSoup(page,'html.parser')
yomama = soup.find(id="Joke_box")
name = yomama.text.strip()
print (name)
Output
Yo mama so fat God told her he had no room in heaven and the devil said there was no room in hell (Submitted by )Yo Mama so fat her BMI is measured in acres. (Submitted by )Yo Mama so fat when she went to the movies she sat next to everyone (Submitted by  )Yo mama so fat when her beeper goes off, people thought she was backing upYo mama so fat her nickname is "Lardo"
What i want is after it finds the word "(Submitted by )" it should exclude that and include a new line instead of "(Submitted by )" how can i achieve this?
Reply
#2
You should not use Python 2 anymore,know this bye urllib2.
The html is a little messy,this is close and i use Requests always for this stuff.
import requests
from bs4 import BeautifulSoup

website = ("http://www.ahajokes.com/ym01.html")
page = requests.get(website)
soup = BeautifulSoup(page.content, 'html.parser')
yomama = soup.find(id="Joke_box")
temp = yomama.text.strip()
temp = temp.replace('(Submitted by )', '\n')
name = temp.replace('(Submitted by  )', '\n')
print(name)
Output:
Yo mama so fat God told her he had no room in heaven and the devil said there was no room in hell Yo Mama so fat her BMI is measured in acres. Yo Mama so fat when she went to the movies she sat next to everyone Yo mama so fat when her beeper goes off, people thought she was backing upYo mama so fat her nickname is "Lardo"
To also fix the long last line,then is more difficult and may need to use regex on the raw html that yomama return,also not using .text.
Reply
#3
Thank you so much this was really helpful
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020