Python Forum
Cannot Remove the Double Quotes on a Certain Word (String) Python BeautifulSoup
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Cannot Remove the Double Quotes on a Certain Word (String) Python BeautifulSoup
#1
Hi guys,

How's it going?

I've been in weeks trying to remove a double-quote (") from a word (as I want to count the word in a certain text or webpage and what not).

Been going back and forth in changing the variable to list or string in order to use their methods.

Here's my code:

import requests
from bs4 import BeautifulSoup

url = 'https://burniva.com/sam-smith-weight-loss/'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')

articles = soup.find('div', class_="cmsmasters_text")
paragraphs = articles.find_all('p')

new_p = []
for p in paragraphs:
    new_p.append(p.get_text())

for p in new_p:
    print(str(p).lstrip('"'))
There would be words like:
“Stay with Me” and “I’m not the Only One”,
"Latch"
“The Thrill of It All”
“Oh! Carol”
etc. etc....

I cannot remove those double-quotes. Tried .replace, .strip, even if I .split them up or turn them into a list. I can remove other special characters like punctuation, apostrophe, period, etc. etc.... but never the double-quotes.

Pls. try the code and you'll get the output of all the article including those that I had mentioned.

Hope someone can help me as I am now very interested to what I am missing, to why I cannot remove them and have a clean list of only just words.

Thanks in advance!
Reply
#2
The replace method is what you want. I expect the problem you are having is that not all of the examples you showed used ascii quotes, some of them are using smart quotes. So you would need to replace three times, replacing the ascii quote, the starting smart quote, and then ending smart quote.

I found these in some old code I used scraping facebook data. I think they are the single and double smart quotes:

REPLACEMENTS = (('\x32\x80\x94', '--'), ('\xe2\x80\x99', "'"), ('\xe2\x80\x98', "'"), 
	('\xe2\x80\x9c', '\\"'), ('\xe2\x80\x9d', '\\"'))
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
(Oct-22-2019, 01:43 PM)soothsayerpg Wrote: I've been in weeks trying to remove a double-quote (") from a word (as I want to count the word in a certain text or webpage and what not).

There would be words like:
“Stay with Me” and “I’m not the Only One”,
"Latch"
[ ... ]

I cannot remove those double-quotes. Tried .replace, .strip, even if I .split them up or turn them into a list. I can remove other special characters like punctuation, apostrophe, period, etc. etc.... but never the double-quotes.

Hi!

It's just a thought, but I think that maybe it's because you are using different types of double quotes, and probably eliminating just one type of them.

The double quotes in “Stay with Me” are, for instance, different from the double quotes in "Latch".

All the best,
newbieAuggie2019

"That's been one of my mantras - focus and simplicity. Simple can be harder than complex: You have to work hard to get your thinking clean to make it simple. But it's worth it in the end because once you get there, you can move mountains."
Steve Jobs
Reply
#4
Hi guys. Thanks for the response!
Did you tried it out? If you try it out and remove all the special chars, that double-quote is the one who'll remain.

I had gotten to a point, thinking, maybe I should have remove it before changing them into a list or string, but still, cannot remove. Just a wild guess though.

@ichadboi801. Will try it out, though the code you had some is a little confusing?
Reply
#5
In Python 3 when there is no b(bytes) b'hello'.
Then all text is Unicode.
So what see is what you get,what i mean bye that is that you can just copy the smart quotes and replace.
>>> p = paragraphs[0].text
>>> print(p)
With hits such as “Stay with Me” and “I’m not the Only One”, Sam Smith has become one of UK’s hottest singers.
>>> 
>>> # Now is just copy smart quotes for line over and replace
>>> print(p.replace('“', '').replace('”', ''))
With hits such as Stay with Me and I’m not the Only One, Sam Smith has become one of UK’s hottest singers.
Also using .text is shorter that get_text(),they do the same.
Reply
#6
(Oct-27-2019, 08:31 AM)soothsayerpg Wrote: Did you tried it out? If you try it out and remove all the special chars, that double-quote is the one who'll remain.

string1 = """There would be words like:
“Stay with Me” and “I’m not the Only One”,
"Latch"
“The Thrill of It All”
“Oh! Carol”
etc. etc. ..."""


print(''.join(characters for characters in string1 if characters not in '"“”'))
Output:
Output:
There would be words like: Stay with Me and I’m not the Only One, Latch The Thrill of It All Oh! Carol etc. etc. ... >>>
All the best,
newbieAuggie2019

"That's been one of my mantras - focus and simplicity. Simple can be harder than complex: You have to work hard to get your thinking clean to make it simple. But it's worth it in the end because once you get there, you can move mountains."
Steve Jobs
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  remove gilberishs from a "string" kucingkembar 2 203 Mar-15-2024, 08:51 AM
Last Post: kucingkembar
  Retrieve word from string knob 4 432 Jan-22-2024, 06:40 PM
Last Post: Pedroski55
  extract substring from a string before a word !! evilcode1 3 491 Nov-08-2023, 12:18 AM
Last Post: evilcode1
  Need help on how to include single quotes on data of variable string hani_hms 5 1,888 Jan-10-2023, 11:26 AM
Last Post: codinglearner
Smile please help me remove error for string.strip() jamie_01 3 1,151 Oct-14-2022, 07:48 AM
Last Post: Pedroski55
  [SOLVED] [BeautifulSoup] Why does it turn inserted string's brackets into </>? Winfried 0 1,452 Sep-03-2022, 11:21 PM
Last Post: Winfried
  [SOLVED] [BeautifulSoup] Turn select() into comma-separated string? Winfried 0 1,087 Aug-19-2022, 08:07 PM
Last Post: Winfried
  Remove a space between a string and variable in print sie 5 1,706 Jul-27-2022, 02:36 PM
Last Post: deanhystad
  Inserting line feeds and comments into a beautifulsoup string arbiel 1 1,145 Jul-20-2022, 09:05 AM
Last Post: arbiel
  How do I remove spurious "." from a string? Zuhan 7 1,962 Apr-12-2022, 02:06 PM
Last Post: Pedroski55

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020