Python Forum

Full Version: Scraping data from table into existing dataframe
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello

1.) I have initial code below that scrapes various stock data from Finviz and gives the output below titled First Output. The problem comes up because I need to add additional information from a different, snapshot table. This information is Sales and Income and I've inserted columns for these fields.

2.) I added additional code under additional code below to pull in the Sales and Income information from the snapshot table.

This line in the additional code finds the first symbol AAPL in the dataframe.
ticker = (df.iloc[0,1]

This line finds the AAPL sales and income information in the snapshot table
print(f'{df[0][3]}: {df[1][3]}: {df[0][2]}: {df[1][2]}').

The Output after adding additional code is shown below and now includes Sales and Income information for AAPL at the bottom of the output.

3.)So my question is how do I get the Sales and Income data to go to their proper columns instead of showing up at the bottom of the output. The result I'm hoping to achieve is below under Desired Output. Does anyone know how I could accomplish this? Thank you so much!!

**Initial Code**
import pandas as pd
import requests
import bs4
import time
import random

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

def testDf(version):
url = 'https://finviz.com/screener.ashx?v=111&f=cap_mega,sh_outstanding_o1000&f=ind_stocksonly&o=-marketcap'

page = 1

screen = requests.get(url.format(version=version, page=page), headers=headers)
soup = bs4.BeautifulSoup(screen.text, features='lxml')
pages = int(soup.find_all('a', {'class': 'screener-pages'})[-1].text)

data = []
for page in range(1, 1 * pages, 20):
print(version, page)
screen = requests.get(url.format(version=version, page=page), headers=headers).text
tables = pd.read_html(screen)
tables = tables[-2]
tables.columns = tables.iloc[0]
tables = tables[1:]
data.append(tables)
time.sleep(random.random())
return pd.concat(data).reset_index(drop=True).rename_axis(columns=None)


df = testDf('152').copy()
df.insert(2,'Sales',"")
df.insert(3,'Income',"")

# The newly converted dataframe is in the df variable
print(df)

Note: I have inserted Sales and Income Fields for later on.

First Output

No. Ticker Sales Income ... P/E Price Change Volume
0 1 AAPL ... 30.71 172.17 0.10% 84987917
1 2 MSFT ... 35.10 314.04 0.05% 32609473
2 3 TSLA ... 333.21 1026.96 -3.54% 27835827
3 4 FB ... 23.72 331.79 -0.20% 1446342
[4 rows x 13 columns]

Additional Code

ticker = (df.iloc[0,1])
#---------- Pulling the data from chosen stock ticker ----------#

url = ('https://finviz.com/quote.ashx?t=' + ticker.upper())

req = requests.get(url,headers=headers)
table = pd.read_html(req.text, attrs = {"class":"snapshot-table2"} )
df = table[0]

print(f'{df[0][3]}: {df[1][3]}: {df[0][2]}: {df[1][2]}')

Output after adding additional code above

No. Ticker Sales Income ... P/E Price Change Volume
0 1 AAPL ... 30.71 172.17 0.10% 84987917
1 2 MSFT ... 35.10 314.04 0.05% 32609473
2 3 TSLA ... 333.21 1026.96 -3.54% 27835827
3 4 FB ... 23.72 331.79 -0.20% 1446342
[4 rows x 13 columns]

Sales: 365.82B: Income: 94.68B

Desired Output

No. Ticker Sales Income ... P/E Price Change Volume
0 1 AAPL 365.82B 94.68M ... 30.71 172.17 0.10% 84987917
1 2 MSFT ... 35.10 314.04 0.05% 32609473
2 3 TSLA ... 333.21 1026.96 -3.54% 27835827
3 4 FB ... 23.72 331.79 -0.20% 1446342
[4 rows x 13 columns]
Hello

1.) I have initial code below that scrapes various stock data from Finviz and gives the output below titled First Output. The problem comes up because I need to add additional information from a different, snapshot table. This information is Sales and Income and I've inserted columns for these fields.

2.) I added additional code under additional code below to pull in the Sales and Income information from the snapshot table.

This line in the additional code finds the first symbol AAPL in the dataframe.
ticker = (df.iloc[0,1]

This line finds the AAPL sales and income information in the snapshot table
print(f'{df[0][3]}: {df[1][3]}: {df[0][2]}: {df[1][2]}').

The Output after adding additional code is shown below and now includes Sales and Income information for AAPL at the bottom of the output.

3.)So my question is how do I get the Sales and Income data to go to their proper columns instead of showing up at the bottom of the output. The result I'm hoping to achieve is below under Desired Output. Does anyone know how I could accomplish this? Thank you so much!!

**Initial Code**
import pandas as pd
import requests
import bs4
import time
import random

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

def testDf(version):
url = 'https://finviz.com/screener.ashx?v=111&f=cap_mega,sh_outstanding_o1000&f=ind_stocksonly&o=-marketcap'

page = 1

screen = requests.get(url.format(version=version, page=page), headers=headers)
soup = bs4.BeautifulSoup(screen.text, features='lxml')
pages = int(soup.find_all('a', {'class': 'screener-pages'})[-1].text)

data = []
for page in range(1, 1 * pages, 20):
print(version, page)
screen = requests.get(url.format(version=version, page=page), headers=headers).text
tables = pd.read_html(screen)
tables = tables[-2]
tables.columns = tables.iloc[0]
tables = tables[1:]
data.append(tables)
time.sleep(random.random())
return pd.concat(data).reset_index(drop=True).rename_axis(columns=None)


df = testDf('152').copy()
df.insert(2,'Sales',"")
df.insert(3,'Income',"")

# The newly converted dataframe is in the df variable
print(df)

Note: I have inserted Sales and Income Fields for later on.

First Output

No. Ticker Sales Income ... P/E Price Change Volume
0 1 AAPL ... 30.71 172.17 0.10% 84987917
1 2 MSFT ... 35.10 314.04 0.05% 32609473
2 3 TSLA ... 333.21 1026.96 -3.54% 27835827
3 4 FB ... 23.72 331.79 -0.20% 1446342
[4 rows x 13 columns]

Additional Code

ticker = (df.iloc[0,1])
#---------- Pulling the data from chosen stock ticker ----------#

url = ('https://finviz.com/quote.ashx?t=' + ticker.upper())

req = requests.get(url,headers=headers)
table = pd.read_html(req.text, attrs = {"class":"snapshot-table2"} )
df = table[0]

print(f'{df[0][3]}: {df[1][3]}: {df[0][2]}: {df[1][2]}')

Output after adding additional code above

No. Ticker Sales Income ... P/E Price Change Volume
0 1 AAPL ... 30.71 172.17 0.10% 84987917
1 2 MSFT ... 35.10 314.04 0.05% 32609473
2 3 TSLA ... 333.21 1026.96 -3.54% 27835827
3 4 FB ... 23.72 331.79 -0.20% 1446342
[4 rows x 13 columns]

Sales: 365.82B: Income: 94.68B

Desired Output

No. Ticker Sales Income ... P/E Price Change Volume
0 1 AAPL 365.82B 94.68M ... 30.71 172.17 0.10% 84987917
1 2 MSFT ... 35.10 314.04 0.05% 32609473
2 3 TSLA ... 333.21 1026.96 -3.54% 27835827
3 4 FB ... 23.72 331.79 -0.20% 1446342
[4 rows x 13 columns]