Python Forum

Full Version: How to use BeautifulSoup4 with pandas series type of html data?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi All,

I have some html data in the form of pandas Series.
For example I am storing this data in a variable-html_series

Now when I try to apply BeautifulSoup here as -
soup = BeautifulSoup(html_series, "html.parser")
print(soup.prettify())
I am getting below error-
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Can you please tell me what I am missing here?

Thanks!
what exactly is html data in the form of pandas Series? This sounds non-sense to me
Let me explain-
1. Using shoppify api I fetched the json from an ecommerce site.
2. I normalized this json data into dataframe by-
df = json_normalize(result)
3. From this dataframe I take out the html content by-
html_data = df['body_html']
4. Now when I use below code I got the error-
soup = BeautifulSoup(html_data, "html.parser")
print(soup.prettify())
Hope I mentioned everything here.
It look like you try to put json data into BeautifulSoup.
What is the contented html_data?
For it to work it's has to be html.
from bs4 import BeautifulSoup

html_data = '''\
<!DOCTYPE html>
<html>
  <head>
    <title>Title of document</title>
  </head>
  <body>
    <p>Content of the document</p>
  </body>
</html'''

soup = BeautifulSoup(html_data, 'lxml')
print(soup.select('head > title')[0].text)
Output:
Title of document
I have resolved the above error it was due to dataframe normalization.
Now I have raise another ticket with below url-
https://python-forum.io/Thread-How-to-cl...Python-3-6

Please see once and let me know if you can help.
Thanks!