Python Forum
Scraping all website text using Python
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scraping all website text using Python
#1
I am very very new to Python at all (so sorry in advance for asking stupid questions). I have an excel sheet with a unique company identifier and the respective URLs next to it for a couple of companies.

What I would like to do is to open the URL and save all the website text (the complete text from the first page of the website) for each of the companies to a separate .txt-file. The name of the file should be the unique identifier from the excel sheet.

Did someone of you something similar in the past or could help me with the code on that task?

That would be great!!
Reply
#2
I suggest that you go through snippsat's web scraping tutorials here:
web scraping part 1
web scraping part 2
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Retrieve website content using Python? Vadanane 1 1,197 Jan-16-2023, 09:55 AM
Last Post: Axel_Erfurt
  web scraping for new additions/modifed website? kingoman123 4 2,184 Apr-14-2022, 04:46 PM
Last Post: snippsat
  I want to create an automated website in python mkdhrub1 2 2,312 Dec-27-2021, 11:27 PM
Last Post: Larz60+
  Scraping lender data from Ren Ren Dai website using Python. I will pay for that 200$ Hafedh_2021 1 2,724 May-18-2021, 08:41 PM
Last Post: snippsat
  Python to build website Methew324 1 2,195 Dec-15-2020, 05:57 AM
Last Post: buran
  Scraping text from application? kamix 1 1,549 Sep-25-2020, 10:53 PM
Last Post: Larz60+
  Python Webscraping with a Login Website warriordazza 0 2,571 Jun-07-2020, 07:04 AM
Last Post: warriordazza
  Scraping a Website (HELP) LearnPython2 1 1,708 May-08-2020, 03:20 PM
Last Post: Larz60+
  scraping from a website that hides source code PIWI_Protein 1 1,938 Mar-27-2020, 05:08 PM
Last Post: Larz60+
  Scraping not moving to the next pages in a website jithin123 0 1,916 Mar-23-2020, 06:10 PM
Last Post: jithin123

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020