Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Web Scrapping Application
#2
I would say that you are using wrong tools,can look at my tutorials here part-1, part-2.
In part-2 i talk a little about concurrency.

lxml is one of the fastest parser in any language(has C library as core).
Can be used trough BeautifulSoup BeautifulSoup(url, 'lxml') or alone.
Use Requests,then you can remove decode stuff you get correct encoded page back.
Regex is a really bad tool for html,a funny answer.

Scrapy is fast,it has build concurrency with Twisted.
Reply


Messages In This Thread
Web Scrapping Application - by lion137 - Feb-10-2017, 09:00 AM
RE: Web Scrapping Application - by snippsat - Feb-10-2017, 09:46 AM
RE: Web Scrapping Application - by lion137 - Feb-10-2017, 09:58 AM
RE: Web Scrapping Application - by snippsat - Feb-10-2017, 10:44 AM
RE: Web Scrapping Application - by lion137 - Feb-10-2017, 11:53 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Problem with scrapping Website giddyhead 1 1,685 Mar-08-2024, 08:20 AM
Last Post: AhanaSharma
  python web scrapping mg24 1 409 Mar-01-2024, 09:48 PM
Last Post: snippsat
  How can I ignore empty fields when scrapping never5000 0 1,426 Feb-11-2022, 09:19 AM
Last Post: never5000
  Suggestion request for scrapping html table Vkkindia 3 2,099 Dec-06-2021, 06:09 PM
Last Post: Larz60+
  web scrapping through Python Naheed 2 2,670 May-17-2021, 12:02 PM
Last Post: Naheed
  Website scrapping and download santoshrane 3 4,425 Apr-14-2021, 07:22 AM
Last Post: kashcode
  Newbie help with lxml scrapping chelsealoa 1 1,897 Jan-08-2021, 09:14 AM
Last Post: Larz60+
  Scrapping Sport score laplacea 1 2,305 Dec-13-2020, 04:09 PM
Last Post: Larz60+
  How to export to csv the output of every iteration when scrapping with a loop efthymios 2 2,349 Nov-30-2020, 07:46 PM
Last Post: efthymios
  Web scrapping - Stopped working peterjv26 2 3,135 Sep-23-2020, 08:30 AM
Last Post: peterjv26

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020