Python Forum
Scraping data from table into existing dataframe
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scraping data from table into existing dataframe
#1
Hello

1.) I have initial code below that scrapes various stock data from Finviz and gives the output below titled First Output. The problem comes up because I need to add additional information from a different, snapshot table. This information is Sales and Income and I've inserted columns for these fields.

2.) I added additional code under additional code below to pull in the Sales and Income information from the snapshot table.

This line in the additional code finds the first symbol AAPL in the dataframe.
ticker = (df.iloc[0,1]

This line finds the AAPL sales and income information in the snapshot table
print(f'{df[0][3]}: {df[1][3]}: {df[0][2]}: {df[1][2]}').

The Output after adding additional code is shown below and now includes Sales and Income information for AAPL at the bottom of the output.

3.)So my question is how do I get the Sales and Income data to go to their proper columns instead of showing up at the bottom of the output. The result I'm hoping to achieve is below under Desired Output. Does anyone know how I could accomplish this? Thank you so much!!

**Initial Code**
import pandas as pd
import requests
import bs4
import time
import random

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

def testDf(version):
url = 'https://finviz.com/screener.ashx?v=111&f=cap_mega,sh_outstanding_o1000&f=ind_stocksonly&o=-marketcap'

page = 1

screen = requests.get(url.format(version=version, page=page), headers=headers)
soup = bs4.BeautifulSoup(screen.text, features='lxml')
pages = int(soup.find_all('a', {'class': 'screener-pages'})[-1].text)

data = []
for page in range(1, 1 * pages, 20):
print(version, page)
screen = requests.get(url.format(version=version, page=page), headers=headers).text
tables = pd.read_html(screen)
tables = tables[-2]
tables.columns = tables.iloc[0]
tables = tables[1:]
data.append(tables)
time.sleep(random.random())
return pd.concat(data).reset_index(drop=True).rename_axis(columns=None)


df = testDf('152').copy()
df.insert(2,'Sales',"")
df.insert(3,'Income',"")

# The newly converted dataframe is in the df variable
print(df)

Note: I have inserted Sales and Income Fields for later on.

First Output

No. Ticker Sales Income ... P/E Price Change Volume
0 1 AAPL ... 30.71 172.17 0.10% 84987917
1 2 MSFT ... 35.10 314.04 0.05% 32609473
2 3 TSLA ... 333.21 1026.96 -3.54% 27835827
3 4 FB ... 23.72 331.79 -0.20% 1446342
[4 rows x 13 columns]

Additional Code

ticker = (df.iloc[0,1])
#---------- Pulling the data from chosen stock ticker ----------#

url = ('https://finviz.com/quote.ashx?t=' + ticker.upper())

req = requests.get(url,headers=headers)
table = pd.read_html(req.text, attrs = {"class":"snapshot-table2"} )
df = table[0]

print(f'{df[0][3]}: {df[1][3]}: {df[0][2]}: {df[1][2]}')

Output after adding additional code above

No. Ticker Sales Income ... P/E Price Change Volume
0 1 AAPL ... 30.71 172.17 0.10% 84987917
1 2 MSFT ... 35.10 314.04 0.05% 32609473
2 3 TSLA ... 333.21 1026.96 -3.54% 27835827
3 4 FB ... 23.72 331.79 -0.20% 1446342
[4 rows x 13 columns]

Sales: 365.82B: Income: 94.68B

Desired Output

No. Ticker Sales Income ... P/E Price Change Volume
0 1 AAPL 365.82B 94.68M ... 30.71 172.17 0.10% 84987917
1 2 MSFT ... 35.10 314.04 0.05% 32609473
2 3 TSLA ... 333.21 1026.96 -3.54% 27835827
3 4 FB ... 23.72 331.79 -0.20% 1446342
[4 rows x 13 columns]
Reply
#2
Hello

1.) I have initial code below that scrapes various stock data from Finviz and gives the output below titled First Output. The problem comes up because I need to add additional information from a different, snapshot table. This information is Sales and Income and I've inserted columns for these fields.

2.) I added additional code under additional code below to pull in the Sales and Income information from the snapshot table.

This line in the additional code finds the first symbol AAPL in the dataframe.
ticker = (df.iloc[0,1]

This line finds the AAPL sales and income information in the snapshot table
print(f'{df[0][3]}: {df[1][3]}: {df[0][2]}: {df[1][2]}').

The Output after adding additional code is shown below and now includes Sales and Income information for AAPL at the bottom of the output.

3.)So my question is how do I get the Sales and Income data to go to their proper columns instead of showing up at the bottom of the output. The result I'm hoping to achieve is below under Desired Output. Does anyone know how I could accomplish this? Thank you so much!!

**Initial Code**
import pandas as pd
import requests
import bs4
import time
import random

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

def testDf(version):
url = 'https://finviz.com/screener.ashx?v=111&f=cap_mega,sh_outstanding_o1000&f=ind_stocksonly&o=-marketcap'

page = 1

screen = requests.get(url.format(version=version, page=page), headers=headers)
soup = bs4.BeautifulSoup(screen.text, features='lxml')
pages = int(soup.find_all('a', {'class': 'screener-pages'})[-1].text)

data = []
for page in range(1, 1 * pages, 20):
print(version, page)
screen = requests.get(url.format(version=version, page=page), headers=headers).text
tables = pd.read_html(screen)
tables = tables[-2]
tables.columns = tables.iloc[0]
tables = tables[1:]
data.append(tables)
time.sleep(random.random())
return pd.concat(data).reset_index(drop=True).rename_axis(columns=None)


df = testDf('152').copy()
df.insert(2,'Sales',"")
df.insert(3,'Income',"")

# The newly converted dataframe is in the df variable
print(df)

Note: I have inserted Sales and Income Fields for later on.

First Output

No. Ticker Sales Income ... P/E Price Change Volume
0 1 AAPL ... 30.71 172.17 0.10% 84987917
1 2 MSFT ... 35.10 314.04 0.05% 32609473
2 3 TSLA ... 333.21 1026.96 -3.54% 27835827
3 4 FB ... 23.72 331.79 -0.20% 1446342
[4 rows x 13 columns]

Additional Code

ticker = (df.iloc[0,1])
#---------- Pulling the data from chosen stock ticker ----------#

url = ('https://finviz.com/quote.ashx?t=' + ticker.upper())

req = requests.get(url,headers=headers)
table = pd.read_html(req.text, attrs = {"class":"snapshot-table2"} )
df = table[0]

print(f'{df[0][3]}: {df[1][3]}: {df[0][2]}: {df[1][2]}')

Output after adding additional code above

No. Ticker Sales Income ... P/E Price Change Volume
0 1 AAPL ... 30.71 172.17 0.10% 84987917
1 2 MSFT ... 35.10 314.04 0.05% 32609473
2 3 TSLA ... 333.21 1026.96 -3.54% 27835827
3 4 FB ... 23.72 331.79 -0.20% 1446342
[4 rows x 13 columns]

Sales: 365.82B: Income: 94.68B

Desired Output

No. Ticker Sales Income ... P/E Price Change Volume
0 1 AAPL 365.82B 94.68M ... 30.71 172.17 0.10% 84987917
1 2 MSFT ... 35.10 314.04 0.05% 32609473
2 3 TSLA ... 333.21 1026.96 -3.54% 27835827
3 4 FB ... 23.72 331.79 -0.20% 1446342
[4 rows x 13 columns]
Larz60+ write Jan-09-2022, 10:38 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.

Also duplicate threads merged
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to scraping data from dinamic site sergio21124444 2 698 Nov-08-2023, 12:43 PM
Last Post: sergio21124444
  Help Scraping links and table from link cartonics 11 1,623 Oct-12-2023, 06:42 AM
Last Post: cartonics
  Web scraping data Mike_Eddy 2 2,557 Jul-03-2021, 05:49 PM
Last Post: Mike_Eddy
  Scraping lender data from Ren Ren Dai website using Python. I will pay for that 200$ Hafedh_2021 1 2,759 May-18-2021, 08:41 PM
Last Post: snippsat
  Scraping a page with log in data (security, proxies) iamaghost 0 2,157 Mar-27-2021, 02:56 PM
Last Post: iamaghost
  Scraping Data from Singapore Turf Club singaporeman 2 2,406 Dec-15-2020, 01:28 PM
Last Post: MrBitPythoner
  Need help scraping wikipedia table bborusz2 6 3,270 Dec-01-2020, 11:31 PM
Last Post: snippsat
Thumbs Up Issue facing while scraping the data from different websites in single script. Balamani 1 2,138 Oct-20-2020, 09:56 AM
Last Post: Larz60+
  Inserting data from a table to another (in same db) firebird 5 2,501 Oct-05-2020, 06:04 AM
Last Post: buran
  Web Scraping Inquiry (Extracting content from a table in asubdomain) DustinKlent 3 3,738 Aug-17-2020, 10:10 AM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020