Python Forum
Search Results
Post Author Forum Replies Views Posted [asc]
    Thread: Need help with lxml.html and xpath
Post: RE: Need help with lxml.html and xpath

Thank you. If I know the content is a list, I can extract that pretty easily. My problem is for some of these fields (target areas to extract such as “business-info” or “products/services”) the cont...
spacedog General Coding Help 5 3,135 Apr-30-2021, 04:50 PM
    Thread: Need help with lxml.html and xpath
Post: RE: Need help with lxml.html and xpath

Thank you @Larz60, but those use bueatifulsoup which is too slow. I have to build something optimized as much as possible. Any other recommendations on for the above question?
spacedog General Coding Help 5 3,135 Apr-30-2021, 02:49 PM
    Thread: Need help with lxml.html and xpath
Post: Need help with lxml.html and xpath

I was using scrapy to create the needed xpath for a lot of elements to scrape. Now that we're using multithreading I moved off of scrapy and just using lxml.html and the text coming off of response.t...
spacedog General Coding Help 5 3,135 Apr-29-2021, 10:58 PM
    Thread: Unable to use random.choice(list) in async method
Post: RE: Unable to use random.choice(list) in async met...

thanks all
spacedog General Coding Help 4 3,354 Apr-29-2021, 04:08 PM
    Thread: Unable to use random.choice(list) in async method
Post: RE: Unable to use random.choice(list) in async met...

Thanks @bowlofred and @menator01. This is a small app and I should have just posted all the code the first time. I added the: await asyncio.sleep(1) and it still did not work. Please see the cod...
spacedog General Coding Help 4 3,354 Apr-29-2021, 04:06 PM
    Thread: Unable to use random.choice(list) in async method
Post: Unable to use random.choice(list) in async method

I need to pull a random proxy from a list in an async method but the code exits the method as soon as it hits the line of code: proxy = random.choice(proxy_list)Here's the full method and the next lin...
spacedog General Coding Help 4 3,354 Apr-29-2021, 06:17 AM
    Thread: Need help multi-threading scraping
Post: RE: Need help multi-threading scraping

The data is OK and pulls as expected from "requests.get". The problem is that xpath returns nothing. data is OK here data = response.texttree = html.fromstring(data) line_numbers = tree.xpath("//h2[...
spacedog General Coding Help 2 2,434 Apr-28-2021, 03:48 PM
    Thread: Need help multi-threading scraping
Post: Need help multi-threading scraping

Using windows 10 pro, python 3.8.8. I have a long list of URLs to loop through to scrape. Each URL has up to 100 pages to navigate through. This is the process: Outer loop - loop thru the URLs (ea...
spacedog General Coding Help 2 2,434 Apr-28-2021, 05:09 AM
    Thread: Prevent urllib.request from using my local proxy
Post: Prevent urllib.request from using my local proxy

Using the code below I can still pull a page from the web. Where you see: 'xxxx' Should be a valid proxy. proxy_handler = urllib.request.ProxyHandler({'http': 'xxxx'}) opener = urllib.request.build...
spacedog General Coding Help 0 2,803 Apr-24-2021, 08:55 PM
    Thread: urllib.request.ProxyHandler works with bad proxy
Post: urllib.request.ProxyHandler works with bad proxy

I'm using urllib.request to read pages for scraping. Things were connecting and reading OK but I wanted to add some exception handling in-case one of my rotating proxies was bad. So I made a bad prox...
spacedog General Coding Help 0 5,853 Apr-24-2021, 08:02 AM
    Thread: How to measure execution time of a multithread loop
Post: RE: How to measure execution time of a multithread...

Thanks. I will look into that.
spacedog General Coding Help 2 2,837 Apr-24-2021, 07:52 AM
    Thread: Need help with XPath using requests,time,urllib.request and BeautifulSoup
Post: RE: Need help with XPath using requests,time,urlli...

Thanks but that didn't do it: IndexError: list index out of range
spacedog General Coding Help 3 2,800 Apr-24-2021, 01:47 AM
    Thread: Need help with XPath using requests,time,urllib.request and BeautifulSoup
Post: Need help with XPath using requests,time,urllib.re...

I have an xpath expression that I know works. Using the URL: https://www.yellowpages.com/houston-tx/m...1657186981 and XPath: //div[@class='sales-info']/H1[1] Should return this: Spector Ivan My ...
spacedog General Coding Help 3 2,800 Apr-24-2021, 12:32 AM
    Thread: How to measure execution time of a multithread loop
Post: How to measure execution time of a multithread loo...

I have a small app that does work on about 100 loops each on a new thread. How can I measure the total execution time from the point where the app started to run to the point where the last thread co...
spacedog General Coding Help 2 2,837 Apr-23-2021, 07:41 PM
    Thread: Environment seems to keep losing references
Post: RE: Environment seems to keep losing references

thank you, that answered the question.
spacedog General Coding Help 2 1,869 Apr-23-2021, 07:36 PM
    Thread: Environment seems to keep losing references
Post: Environment seems to keep losing references

I’m new to Python and am using Scrapy, ProxyChecker pandas, urllib, et. And using VS Code. When I first set things up and started coding this project all worked OK. But as time goes on I started to ...
spacedog General Coding Help 2 1,869 Apr-20-2021, 06:10 PM

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020