Dec-17-2021, 07:35 AM
Hi all,
I'm practicing a little webscraping and I'm extracting various elements from it. Everything is going as expected except for one component. For simplicity, I've omitted the other elements just to keep the code short and sweet (it shouldn't make any difference to this issue at all).
When extracting this element and then 'printing' the statement on screen, the variable and the loop works perfectly.
Here's an example:
Looks perfect and what I would expect.
So as I will eventually want to export this data using pandas, so as per normal I added one additional line to the code to append the data to a variable:
I've never seen this happen before and after spending a few hours mulling over it and trying a couple of things, I don't know why this is happening.
Could anyone please help enlighten me with this?
Thanks a lot.
I'm practicing a little webscraping and I'm extracting various elements from it. Everything is going as expected except for one component. For simplicity, I've omitted the other elements just to keep the code short and sweet (it shouldn't make any difference to this issue at all).
When extracting this element and then 'printing' the statement on screen, the variable and the loop works perfectly.
Here's an example:
net_profit = [] with open("C:/Users/websites_page.html", encoding="utf8") as fp: soup = BeautifulSoup(fp, 'lxml') full_list_top_half = soup.find_all('div', class_ = 'col-lg-9 px-lg-3 d-flex flex-column justify-content-between') for item in full_list_top_half: # Get Website URL col = item.find('div', class_ = 'col-lg-9') #Get Asset Type content_between = col.find('div', class_ = 'd-flex flex-nowrap justify-content-between') #Get Net Profit text_truncate = content_between.find('div', class_ = 'font-weight-bold text-truncate') nprofit = text_truncate.find('span', class_ = 'ng-binding ng-scope').text print(nprofit)I get:
Output: $4,331 p/mo
$9,429 p/mo
$1,599 p/mo
$110,133 p/mo
$1,475 p/mo
Looks perfect and what I would expect.
So as I will eventually want to export this data using pandas, so as per normal I added one additional line to the code to append the data to a variable:
net_profit = [] with open("C:/Users/websites_page.html", encoding="utf8") as fp: soup = BeautifulSoup(fp, 'lxml') full_list_top_half = soup.find_all('div', class_ = 'col-lg-9 px-lg-3 d-flex flex-column justify-content-between') for item in full_list_top_half: # Get Website URL col = item.find('div', class_ = 'col-lg-9') #Get Asset Type content_between = col.find('div', class_ = 'd-flex flex-nowrap justify-content-between') #Get Net Profit text_truncate = content_between.find('div', class_ = 'font-weight-bold text-truncate') nprofit = text_truncate.find('span', class_ = 'ng-binding ng-scope').text #print(nprofit) net_profit.append(nprofit) print(net_profit)So the second piece of code includes:
net_profit.append(nprofit)and when I print that, I now get:
Output:"C:\Users\anaconda3\python.exe" "C:/Users/test_file.py"
[' $4,331 p/mo']
[' $4,331 p/mo', ' $9,429 p/mo']
[' $4,331 p/mo', ' $9,429 p/mo', ' $1,599 p/mo']
[' $4,331 p/mo', ' $9,429 p/mo', ' $1,599 p/mo', ' $110,133 p/mo']
[' $4,331 p/mo', ' $9,429 p/mo', ' $1,599 p/mo', ' $110,133 p/mo', ' $1,475 p/mo']
Process finished with exit code 0
'nprofit' prints as expected, but net_profit is looping and adding each piece of data to it. I've never seen this happen before and after spending a few hours mulling over it and trying a couple of things, I don't know why this is happening.
Could anyone please help enlighten me with this?
Thanks a lot.