Python Forum
Scraping a page with log in data (security, proxies)
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scraping a page with log in data (security, proxies)
#1
Hey guys,

I have a few questions regarding the safety issues whilst scraping the web.

Im working for an agency selling tickets for a big flight company and i want to scrape client data from my agency page of the flight company website.
In order to scrape the data i have to do the followining.

1. Login
2. Click on all the needed buttons to get to a page with a list of my clients (buttons/links)
3. Click on first person (button/link)
4. Scrape person nr. 1
5. Go back to people list
6. Click on following person and scrape data
7. Repeat 6.-7.

I already buildt a program but i am hesitating to start it, because i am scared the traveling agency could find out. The script should basically run 2/3s of a day 5 times a week since i work there and i dont want to write that shit down Doh. And since i am using my login information, the flight company should know where the requests are all coming from even if i rotate through proxies and so on. I am not well read in internet security or how the web is buildt in general so i need some help.

Here are the questions i dont really have a good answer too:
1.) What would be the safest way to keep such an infrastructure alive as long as possible and not be tracked or spotted by the traveling company.
2.) Currently i am using selenium, rotate through proxies and user agents, but i dont know what would be smarter: since i am scraping while i am logged in, can the flight company (webpage) find out, that the same login data is being used from different ip-adresses, so from different places? If thats the case, i think it would be counter-productive to use rotating proxies. Or at least i would need proxies from my country i guess.
3.) If the traveling company cant find out the connection between my log in data and my ip adresses: do i have to switch proxies and log in again? Or can I stay logged in and switch my proxy?
4.) I am also imitating human behavior. I have a huge list of probabilities with times and the code randomly chooses a probable one a human could possibly wait at that point . So sometimes 1 second, sometimes 10-20 for example.
6.) Can the web page detect my mouse movement?
If yes, then i’d have to consider mouse movement in 4.)?!


I can add some code if my explanation is not clear enough, but my problem is not due to lack of coding skill but due to lack of knowledge about all this security and web stuff Doh

I would even be thankful if you could redirect me to some topics that i can learn or study to try some new approaches. But currently i am stuck and i dont even know what to look for.
Any books or documentations regarding that topic would also be very much appreciated.

Thanks
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to scraping data from dinamic site sergio21124444 2 682 Nov-08-2023, 12:43 PM
Last Post: sergio21124444
  I am scraping a web page but got an Error Sarmad54 3 1,452 Mar-02-2023, 08:20 PM
Last Post: Sarmad54
  Scraping data from table into existing dataframe vincer58 1 2,008 Jan-09-2022, 05:15 PM
Last Post: vincer58
  Scraping the page without distorting content oleglpts 5 2,486 Dec-16-2021, 05:08 PM
Last Post: oleglpts
  trying to save data automatically from this page thunderspeed 1 2,008 Sep-19-2021, 04:57 AM
Last Post: ndc85430
  Web scraping data Mike_Eddy 2 2,537 Jul-03-2021, 05:49 PM
Last Post: Mike_Eddy
  Scraping lender data from Ren Ren Dai website using Python. I will pay for that 200$ Hafedh_2021 1 2,753 May-18-2021, 08:41 PM
Last Post: snippsat
  Scraping .aspx page Larz60+ 21 51,141 Mar-18-2021, 10:16 AM
Last Post: Larz60+
  Scraping Whole Page Source GJG 1 2,139 Jan-13-2021, 03:19 PM
Last Post: GJG
  Scraping Data from Singapore Turf Club singaporeman 2 2,389 Dec-15-2020, 01:28 PM
Last Post: MrBitPythoner

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020