Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need some help with parsing
#11
(Jan-20-2020, 06:40 PM)jkessous Wrote: how can i run it recursively and put the input in an excel or something similar as i will need this data for the next script.
Pandas is very powerful so can use it alone(as a replacement for excel) or eg save to excel df.to_excel()
Here a Notebook with some of the data as example.
[Image: ZDczpf.png]
So here i use JupyterLab,see that data now look similar in Pandas and Excel.
Reply
#12
Anyone can assist with the recursive part?
thanks alot

Hey Snippsat,
Sorry i missed your response.
I tried to run the following code but the excel comes empty. i assume i am running it wrong.

from requests import Session
from bs4 import BeautifulSoup as bs

 
with Session() as s:
    site = s.get("https://connectedinvestors.com/login")
    bs_content = bs(site.content, "html.parser")
    token = bs_content.find("input", {"name":"infusion_id"})["value"]
    login_data = {"email":"[email protected]","password":"password", "infusion_id":token}
    s.post("https://connectedinvestors.com/login",login_data)




import requests
from bs4 import BeautifulSoup
 
url = 'https://connectedinvestors.com/member/jonathan-kessous/friends/2'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
for tag in soup.find_all('div', class_="investorcard clearfix"):
    h4 = tag.find('h4')
    print(h4.a.text.strip(), tag.attrs['id'])
Output:
Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from requests import Session >>> from bs4 import BeautifulSoup as bs >>> >>> ... with Session() as s: ... site = s.get("https://connectedinvestors.com/login") ... bs_content = bs(site.content, "html.parser") ... token = bs_content.find("input", {"name":"infusion_id"})["value"] ... login_data = {"email":"[email protected]","password":"password", "infusion_id":token} ... s.post("https://connectedinvestors.com/login",login_data) ... <Response [200]> >>> >>> import requests >>> from bs4 import BeautifulSoup >>> ... url = 'https://connectedinvestors.com/member/jonathan-kessous/friends/2' >>> url_get = requests.get(url) >>> soup = BeautifulSoup(url_get.content, 'html.parser') >>> for tag in soup.find_all('div', class_="investorcard clearfix"): ... h4 = tag.find('h4') ... print(h4.a.text.strip(), tag.attrs['id']) ... Jonathan Kessous 454517 Alisha Taylor 461791 Investor GUY 2139 Victor Gardner 541025 Shiran Clarfield 541190 Naomi Hunkin 438944 Nathan Cron 274631 Trottie Mcqueen 439844 Scott Kramer 383773 Tim Harvey 20328 Tom E 6057 Gerald Harris 244489 Jason K 9758 Arkady Sorokin 556916 matthew andrews 290014 Tim G. Harris 379810 Tee Lynnae 19243 Todd Tinker 293530 Don Gose 5430 Nate Owens 738721 Garris Covington 365412 Bonnie Tijerina 1116116 Mike Embry 502656 Lori Brooks 450285 Tswv Yang 475601 Lisa Griffiths 545822 Dethorn Graham 498426 Adrian Provost 12387 Cindy Frausto 253720 Ernest 465664 Angela Brinegar 421253 >>> import pandas as pd >>> df = pd.read_clipboard(names=["Name", "temp", "id"]) >>> df1 = pd.DataFrame(df['Name'] +' '+ df['temp']) >>> df2 = df1.assign(id=df['id'].values) >>> df2.columns = ['Names', 'id'] >>> df2.to_excel("output.xlsx", index=False) >>> df2 Names id df = pd.read_clipboard(names=["Name", temp, id]) >>>
thanks. Also how can i run this for the additional 300 pages.

thanks buddy appreciate it alot.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020