Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Python - Scrapy Login in
#1
Hello guys...i need your help...i was messing with the scrapy earlier, but for some reason my script doesnt work

# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders.init import InitSpider

class StrongbotSpider(InitSpider):
    name = 'StrongBot'
    login_url = 'https://www.tatechnix.de/tatechnix/gx/login.php'
    start_urls = ['https://www.tatechnix.de/tatechnix/gx/product_info.php?info=p44235_ta-technix-sport-suspension-kit-opel-astra-h-caravan-2-0t-1-7-1-9cdti--without-level-control-type-a-h-30-30mm.html']

    def init_request(self):
        return scrapy.Request(
            url=self.login_url,
            callback=self.login,
        )

    def login(self, response):
        yield scrapy.FormRequest.from_response(
            response=response,
            formid='login',
            formdata={
                'email_address': 'example',
                'password': 'example',
            },
            callback=self.initialized,
        )

    def parse(self, response):
        for content in response.css('#gm_attr_calc_price'):
            yield {
                'Price' : content.css('span[itemprop="price"]::Text').extract()
            }
Here is the results:
(Scrapy) C:\Users\Petros\Python\TaTechnix18>scrapy crawl StrongBot
2018-10-19 09:59:32 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: TaTechnix18)
2018-10-19 09:59:32 [scrapy.utils.log] INFO: Versions: lxml 4.2.1.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.4.0, w3lib 1.19.0, Twisted 17.5.0, Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:23:52) [MSC v.1900 32 bit (Intel)], pyOpenSSL 18.0.0 (OpenSSL 1.0.2o  27 Mar 2018), cryptography 2.2.2, Platform Windows-10-10.0.17134-SP0
2018-10-19 09:59:32 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'TaTechnix18', 'NEWSPIDER_MODULE': 'TaTechnix18.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['TaTechnix18.spiders']}
2018-10-19 09:59:32 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2018-10-19 09:59:33 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-10-19 09:59:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-10-19 09:59:33 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-10-19 09:59:33 [scrapy.core.engine] INFO: Spider opened
2018-10-19 09:59:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-10-19 09:59:33 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-10-19 09:59:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.tatechnix.de/robots.txt> (referer: None)
2018-10-19 09:59:34 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://www.tatechnix.de/tatechnix/gx/login.php>
2018-10-19 09:59:34 [scrapy.core.engine] INFO: Closing spider (finished)
2018-10-19 09:59:34 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
 'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 1,
 'downloader/request_bytes': 225,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 7658,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 10, 19, 6, 59, 34, 360982),
 'log_count/DEBUG': 3,
 'log_count/INFO': 7,
 'response_received_count': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2018, 10, 19, 6, 59, 33, 758907)}
2018-10-19 09:59:34 [scrapy.core.engine] INFO: Spider closed (finished)
p.s



Here's another method but doesnt work either
# -*- coding: utf-8 -*-
import scrapy

class StrongbotSpider(scrapy.Spider):
    name = 'StrongBot'
    login_url = 'https://www.tatechnix.de/tatechnix/gx/login.php'
    start_urls = ['https://www.tatechnix.de/tatechnix/gx/product_info.php?info=p44235_ta-technix-sport-suspension-kit-opel-astra-h-caravan-2-0t-1-7-1-9cdti--without-level-control-type-a-h-30-30mm.html']

    def login(self, response):
        data = {
            'email_address': 'example@example.com',
            'password': 'example',
            }
        yield scrapy.FormRequest(url=self.login_url, formdata=data, callback=self.parse_products)

    def parse(self, response):
        for content in response.css('#gm_attr_calc_price'):
            yield {
                'Price' : content.css('span[itemprop="price"]::Text').extract()
            }
Quote
#2
(Oct-19-2018, 07:43 AM)Baggelhsk95 Wrote:
2018-10-19 09:59:34 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://www.tatechnix.de/tatechnix/gx/login.php>
Looks like the website doesn't want bots to visit the login page.
If you want, you can tell scrapy not to respect robots.txt using the ROBOTSTXT_OBEY setting.
Quote
#3
if ill run normal bot without login, im getting the data just fine....
Quote
#4
If you don't get to open the login page, the initialized() callback never gets called, so your spider never goes on to process the starting requests...
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Python-selenium script for automated web-login does not work hectorKJ 2 729 Sep-10-2019, 01:29 PM
Last Post: buran
  HOWTO? Login DSL Modem with Python Requests: need Click "Apply" Button Webtest 4 3,454 Aug-20-2019, 04:03 PM
Last Post: johnmina
  Python - Scrapy Baggelhsk95 0 517 Apr-24-2019, 01:07 PM
Last Post: Baggelhsk95
  Python Scrapy ebay API Baggelhsk95 0 913 Nov-21-2018, 11:22 AM
Last Post: Baggelhsk95
  Python scrapy scraped_items Baggelhsk95 2 775 Nov-13-2018, 08:30 AM
Last Post: Baggelhsk95
  Python - Scrapy - CSS selector Baggelhsk95 1 1,733 Nov-07-2018, 04:45 PM
Last Post: stranac
  Python - Scrapy - Contains Baggelhsk95 3 1,184 Oct-27-2018, 03:42 PM
Last Post: stranac
  Python - Scrapy Ebay Test Baggelhsk95 4 1,077 Oct-16-2018, 12:37 PM
Last Post: snippsat
  Python - Scrapy Login form Baggelhsk95 4 4,648 Oct-16-2018, 08:01 AM
Last Post: Baggelhsk95
  Python - Scrapy Javascript Pagination (next_page) Baggelhsk95 3 3,589 Oct-08-2018, 01:20 PM
Last Post: stranac

Forum Jump:


Users browsing this thread: 1 Guest(s)