Python Forum
Cannot Remove the Double Quotes on a Certain Word (String) Python BeautifulSoup - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Cannot Remove the Double Quotes on a Certain Word (String) Python BeautifulSoup (/thread-21958.html)



Cannot Remove the Double Quotes on a Certain Word (String) Python BeautifulSoup - soothsayerpg - Oct-22-2019

Hi guys,

How's it going?

I've been in weeks trying to remove a double-quote (") from a word (as I want to count the word in a certain text or webpage and what not).

Been going back and forth in changing the variable to list or string in order to use their methods.

Here's my code:

import requests
from bs4 import BeautifulSoup

url = 'https://burniva.com/sam-smith-weight-loss/'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')

articles = soup.find('div', class_="cmsmasters_text")
paragraphs = articles.find_all('p')

new_p = []
for p in paragraphs:
    new_p.append(p.get_text())

for p in new_p:
    print(str(p).lstrip('"'))
There would be words like:
“Stay with Me” and “I’m not the Only One”,
"Latch"
“The Thrill of It All”
“Oh! Carol”
etc. etc....

I cannot remove those double-quotes. Tried .replace, .strip, even if I .split them up or turn them into a list. I can remove other special characters like punctuation, apostrophe, period, etc. etc.... but never the double-quotes.

Pls. try the code and you'll get the output of all the article including those that I had mentioned.

Hope someone can help me as I am now very interested to what I am missing, to why I cannot remove them and have a clean list of only just words.

Thanks in advance!


RE: Cannot Remove the Double Quotes on a Certain Word (String) Python BeautifulSoup - ichabod801 - Oct-22-2019

The replace method is what you want. I expect the problem you are having is that not all of the examples you showed used ascii quotes, some of them are using smart quotes. So you would need to replace three times, replacing the ascii quote, the starting smart quote, and then ending smart quote.

I found these in some old code I used scraping facebook data. I think they are the single and double smart quotes:

REPLACEMENTS = (('\x32\x80\x94', '--'), ('\xe2\x80\x99', "'"), ('\xe2\x80\x98', "'"), 
	('\xe2\x80\x9c', '\\"'), ('\xe2\x80\x9d', '\\"'))



RE: Cannot Remove the Double Quotes on a Certain Word (String) Python BeautifulSoup - newbieAuggie2019 - Oct-22-2019

(Oct-22-2019, 01:43 PM)soothsayerpg Wrote: I've been in weeks trying to remove a double-quote (") from a word (as I want to count the word in a certain text or webpage and what not).

There would be words like:
“Stay with Me” and “I’m not the Only One”,
"Latch"
[ ... ]

I cannot remove those double-quotes. Tried .replace, .strip, even if I .split them up or turn them into a list. I can remove other special characters like punctuation, apostrophe, period, etc. etc.... but never the double-quotes.

Hi!

It's just a thought, but I think that maybe it's because you are using different types of double quotes, and probably eliminating just one type of them.

The double quotes in “Stay with Me” are, for instance, different from the double quotes in "Latch".

All the best,


RE: Cannot Remove the Double Quotes on a Certain Word (String) Python BeautifulSoup - soothsayerpg - Oct-27-2019

Hi guys. Thanks for the response!
Did you tried it out? If you try it out and remove all the special chars, that double-quote is the one who'll remain.

I had gotten to a point, thinking, maybe I should have remove it before changing them into a list or string, but still, cannot remove. Just a wild guess though.

@ichadboi801. Will try it out, though the code you had some is a little confusing?


RE: Cannot Remove the Double Quotes on a Certain Word (String) Python BeautifulSoup - snippsat - Oct-27-2019

In Python 3 when there is no b(bytes) b'hello'.
Then all text is Unicode.
So what see is what you get,what i mean bye that is that you can just copy the smart quotes and replace.
>>> p = paragraphs[0].text
>>> print(p)
With hits such as “Stay with Me” and “I’m not the Only One”, Sam Smith has become one of UK’s hottest singers.
>>> 
>>> # Now is just copy smart quotes for line over and replace
>>> print(p.replace('“', '').replace('”', ''))
With hits such as Stay with Me and I’m not the Only One, Sam Smith has become one of UK’s hottest singers.
Also using .text is shorter that get_text(),they do the same.


RE: Cannot Remove the Double Quotes on a Certain Word (String) Python BeautifulSoup - newbieAuggie2019 - Oct-27-2019

(Oct-27-2019, 08:31 AM)soothsayerpg Wrote: Did you tried it out? If you try it out and remove all the special chars, that double-quote is the one who'll remain.

string1 = """There would be words like:
“Stay with Me” and “I’m not the Only One”,
"Latch"
“The Thrill of It All”
“Oh! Carol”
etc. etc. ..."""


print(''.join(characters for characters in string1 if characters not in '"“”'))
Output:
Output:
There would be words like: Stay with Me and I’m not the Only One, Latch The Thrill of It All Oh! Carol etc. etc. ... >>>
All the best,