Extract json-ld schema markup data and store in MongoDB

Nuwan16 · Apr-05-2020, 04:06 PM

I'am creating a spider to crawl webpage' json-ld schema markup and store data in mongodb. actually I want to scrape json-ld schema markup and extract the data type("@type" : "_____") from schema markup and store this @type in mongodb. My spiders crawl well whole schema markup code. But I want to know that How to extract @type from that json-ld schema markup and store it in mongodb.
This is my spider files

apple_spider.py

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

import scrapy
from pprint import pprint
from extruct.jsonld import JsonLdExtractor
from ..items import ApplespiderItem
 
class AppleSpider(scrapy.Spider):
    name = 'apple'
    allowed_domains = ['apple.com']
    start_urls = (
        'http://www.apple.com/shop/mac/mac-accessories',
        )
 
    def parse(self, response):
 
        extractor = JsonLdExtractor()
 
        items = extractor.extract(response.body_as_unicode(), response.url)
        pprint(items)
 
        for item in items:
            if item.get('properties', {}).get('name'):
                properties = item['properties']
 
                 
                yield {
                    'name': properties['name'],
                    'price': properties['offers']['properties']['price'],
                    'url': properties['url']
                }

items.py

        
              import scrapy
 
class ApplespiderItem(scrapy.Item):
    # define the fields for your item here like:
    name = scrapy.Field()
    price = scrapy.Field()
    url = scrapy.Field()

pipelines.py

        
              import pymongo
 
class ApplespiderPipeline(object):
 
    def __init__(self):
        self.conn = pymongo.MongoClient(
            'localhost',
            27017
        )
        db = self.conn['newdb']
        self.collection = db['app_tb']
 
    def process_item(self, item, spider):
        self.collection.insert(dict(item))
        return item

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Flask_table module compatibility issue: cannot import name 'Markup' from 'flask'	venkateshbalagiri	3	3,012	Sep-13-2024, 06:14 AM Last Post: buran
	Save JSON data to sqlite database on Django	Quin	0	3,968	Mar-26-2022, 06:22 PM Last Post: Quin
	How can users store data temporarily in flask app?	darktitan	6	4,910	Mar-21-2022, 06:38 PM Last Post: darktitan
	Extract data from sports betting sites	nestor	3	7,784	Mar-30-2021, 04:37 PM Last Post: Larz60+
	Retrieve images base64 encoded MongoDB and Flask	Nuwan16	2	4,531	Oct-13-2020, 06:25 PM Last Post: Nuwan16
	How and where to store a data for path tree?	zayacEBN	1	2,858	Aug-21-2020, 10:14 PM Last Post: Larz60+
	Store Screenshot Selenium + MongoDB	Nuwan16	9	5,479	Aug-18-2020, 03:57 AM Last Post: ndc85430
	Extract data from a table	Bob_M	3	3,531	Aug-14-2020, 03:36 PM Last Post: Bob_M
	filtering by category flask+mongodb	Leon79	3	13,832	Jul-19-2020, 04:25 AM Last Post: ndc85430
	error when trying to update mongodb	damian0612	6	4,900	Jul-04-2020, 07:25 PM Last Post: damian0612

Extract json-ld schema markup data and store in MongoDB

User Panel Messages

Announcements