Python Forum

Full Version: TypeError list indices must be integers or slices not str
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
when i try this

import scrapy
from extruct.w3cmicrodata import MicrodataExtractor

class AppleSpider(scrapy.Spider):
    name = "apple"
    allowed_domains = ["apple.com"]
    start_urls = (
        'http://www.apple.com/shop/mac/mac-accessories',
        )

    def parse(self, response):

        extractor = MicrodataExtractor()

        items = extractor.extract(response.body_as_unicode(), response.url)["items"]

        for item in items:
            if item.get('properties', {}).get('name'):
                properties = item['properties']
                yield {
                    'name': properties['name'],
                    'price': properties['offers']['properties']['price'],
                    'url': properties['url']
                }
it shows this error

Error:
items = extractor.extract(response.body_as_unicode(), response.url)["items"] TypeError: list indices must be integers or slices, not str
How I fix it? Huh
Please show entire unaltered error traceback, it contains valuable information.
Error:
2020-04-04 11:38:16 [scrapy.utils.log] INFO: Scrapy 2.0.1 started (bot: applespider) 2020-04-04 11:38:16 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.5, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1f 31 Mar 2020), cryptography 2.9, Platform Windows-10-10.0.18362-SP0 2020-04-04 11:38:16 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor 2020-04-04 11:38:16 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'applespider', 'NEWSPIDER_MODULE': 'applespider.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['applespider.spiders']} 2020-04-04 11:38:16 [scrapy.extensions.telnet] INFO: Telnet Password: 729d3c64b2ac151a 2020-04-04 11:38:16 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2020-04-04 11:38:16 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2020-04-04 11:38:16 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2020-04-04 11:38:16 [scrapy.middleware] INFO: Enabled item pipelines: [] 2020-04-04 11:38:16 [scrapy.core.engine] INFO: Spider opened 2020-04-04 11:38:16 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2020-04-04 11:38:16 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2020-04-04 11:38:20 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.apple.com/robots.txt> from <GET http://www.apple.com/robots.txt> 2020-04-04 11:38:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.apple.com/robots.txt> (referer: None) 2020-04-04 11:38:21 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.apple.com/shop/mac/mac-accessories> from <GET http://www.apple.com/shop/mac/mac-accessories> 2020-04-04 11:38:22 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.apple.com/shop/mac/mac-accessories> (referer: None) 2020-04-04 11:38:22 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.apple.com/shop/mac/mac-accessories> (referer: None) Traceback (most recent call last): File "e:\applespider\lib\site-packages\scrapy\utils\defer.py", line 117, in iter_errback yield next(it) File "e:\applespider\lib\site-packages\scrapy\utils\python.py", line 345, in __next__ return next(self.data) File "e:\applespider\lib\site-packages\scrapy\utils\python.py", line 345, in __next__ return next(self.data) File "e:\applespider\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable for r in iterable: File "e:\applespider\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output for x in result: File "e:\applespider\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable for r in iterable: File "e:\applespider\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 338, in <genexpr> return (_set_referer(r) for r in result or ()) File "e:\applespider\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable for r in iterable: File "e:\applespider\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr> return (r for r in result or () if _filter(r)) File "e:\applespider\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable for r in iterable: File "e:\applespider\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr> return (r for r in result or () if _filter(r)) File "e:\applespider\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable for r in iterable: File "E:\AppleSpider\applespider\applespider\spiders\apple_spider.py", line 15, in parse items = extractor.extract(response.body_as_unicode(), response.url)["items"] TypeError: list indices must be integers or slices, not str 2020-04-04 11:38:22 [scrapy.core.engine] INFO: Closing spider (finished) 2020-04-04 11:38:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 1058, 'downloader/request_count': 4, 'downloader/request_method_count/GET': 4, 'downloader/response_bytes': 21123, 'downloader/response_count': 4, 'downloader/response_status_count/200': 2, 'downloader/response_status_count/301': 2, 'elapsed_time_seconds': 6.519425, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2020, 4, 4, 6, 8, 22, 868191), 'log_count/DEBUG': 4, 'log_count/ERROR': 1, 'log_count/INFO': 10, 'response_received_count': 2, 'robotstxt/request_count': 1, 'robotstxt/response_count': 1, 'robotstxt/response_status_count/200': 1, 'scheduler/dequeued': 2, 'scheduler/dequeued/memory': 2, 'scheduler/enqueued': 2, 'scheduler/enqueued/memory': 2, 'spider_exceptions/TypeError': 1, 'start_time': datetime.datetime(2020, 4, 4, 6, 8, 16, 348766)} 2020-04-04 11:38:22 [scrapy.core.engine] INFO: Spider closed (finished)
looking at the source code for MicrodataExtractor confirms what was obvious also from the error message:

this extractor.extract(response.body_as_unicode(), response.url) will return list. That is why you get the error.
I would add a print to see what exactly is returned.

e.g.
items = extractor.extract(response.body_as_unicode(), response.url)
print(items)
Thank you Buran Smile Dance now It's working