(Jan-20-2020, 06:40 PM)jkessous Wrote: [ -> ]how can i run it recursively and put the input in an excel or something similar as i will need this data for the next script.
Pandas is very powerful so can use it alone(as a replacement for excel) or eg save to excel
df.to_excel()
Here a
Notebook with some of the data as example.
[Image: ZDczpf.png]
So here i use
JupyterLab,see that data now look similar in Pandas and Excel.
Anyone can assist with the recursive part?
thanks alot
Hey Snippsat,
Sorry i missed your response.
I tried to run the following code but the excel comes empty. i assume i am running it wrong.
from requests import Session
from bs4 import BeautifulSoup as bs
with Session() as s:
site = s.get("https://connectedinvestors.com/login")
bs_content = bs(site.content, "html.parser")
token = bs_content.find("input", {"name":"infusion_id"})["value"]
login_data = {"email":"[email protected]","password":"password", "infusion_id":token}
s.post("https://connectedinvestors.com/login",login_data)
import requests
from bs4 import BeautifulSoup
url = 'https://connectedinvestors.com/member/jonathan-kessous/friends/2'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
for tag in soup.find_all('div', class_="investorcard clearfix"):
h4 = tag.find('h4')
print(h4.a.text.strip(), tag.attrs['id'])
Output:
Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from requests import Session
>>> from bs4 import BeautifulSoup as bs
>>>
>>>
... with Session() as s:
... site = s.get("https://connectedinvestors.com/login")
... bs_content = bs(site.content, "html.parser")
... token = bs_content.find("input", {"name":"infusion_id"})["value"]
... login_data = {"email":"[email protected]","password":"password", "infusion_id":token}
... s.post("https://connectedinvestors.com/login",login_data)
...
<Response [200]>
>>>
>>> import requests
>>> from bs4 import BeautifulSoup
>>>
... url = 'https://connectedinvestors.com/member/jonathan-kessous/friends/2'
>>> url_get = requests.get(url)
>>> soup = BeautifulSoup(url_get.content, 'html.parser')
>>> for tag in soup.find_all('div', class_="investorcard clearfix"):
... h4 = tag.find('h4')
... print(h4.a.text.strip(), tag.attrs['id'])
...
Jonathan Kessous 454517
Alisha Taylor 461791
Investor GUY 2139
Victor Gardner 541025
Shiran Clarfield 541190
Naomi Hunkin 438944
Nathan Cron 274631
Trottie Mcqueen 439844
Scott Kramer 383773
Tim Harvey 20328
Tom E 6057
Gerald Harris 244489
Jason K 9758
Arkady Sorokin 556916
matthew andrews 290014
Tim G. Harris 379810
Tee Lynnae 19243
Todd Tinker 293530
Don Gose 5430
Nate Owens 738721
Garris Covington 365412
Bonnie Tijerina 1116116
Mike Embry 502656
Lori Brooks 450285
Tswv Yang 475601
Lisa Griffiths 545822
Dethorn Graham 498426
Adrian Provost 12387
Cindy Frausto 253720
Ernest 465664
Angela Brinegar 421253
>>> import pandas as pd
>>> df = pd.read_clipboard(names=["Name", "temp", "id"])
>>> df1 = pd.DataFrame(df['Name'] +' '+ df['temp'])
>>> df2 = df1.assign(id=df['id'].values)
>>> df2.columns = ['Names', 'id']
>>> df2.to_excel("output.xlsx", index=False)
>>> df2
Names id
df = pd.read_clipboard(names=["Name", temp, id])
>>>
thanks. Also how can i run this for the additional 300 pages.
thanks buddy appreciate it alot.