Python Forum

Full Version: How to use BeautifulSoup4 with pandas series type of html data?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi All,

I have some html data in the form of pandas Series.
For example I am storing this data in a variable-html_series

Now when I try to apply BeautifulSoup here as -
soup = BeautifulSoup(html_series, "html.parser")
I am getting below error-
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Can you please tell me what I am missing here?

what exactly is html data in the form of pandas Series? This sounds non-sense to me
Let me explain-
1. Using shoppify api I fetched the json from an ecommerce site.
2. I normalized this json data into dataframe by-
df = json_normalize(result)
3. From this dataframe I take out the html content by-
html_data = df['body_html']
4. Now when I use below code I got the error-
soup = BeautifulSoup(html_data, "html.parser")
Hope I mentioned everything here.
It look like you try to put json data into BeautifulSoup.
What is the contented html_data?
For it to work it's has to be html.
from bs4 import BeautifulSoup

html_data = '''\
<!DOCTYPE html>
    <title>Title of document</title>
    <p>Content of the document</p>

soup = BeautifulSoup(html_data, 'lxml')
print('head > title')[0].text)
Title of document
I have resolved the above error it was due to dataframe normalization.
Now I have raise another ticket with below url-

Please see once and let me know if you can help.