Python Forum
Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scraping Data issues
#1
I'm trying to scrape data from a game's market and use the data to track it.

So I have the scraper log in, it goes and pulls some items on the market and stores them in a table.

Now I'm trying to have it use those stored items in the table, to compare it to similar items being sold on the market. I have it so after it stores each item in the table, it goes and checks for the item id on each item, then looks them up.

When it goes to look them up again, it is making me log into the game again. Is this because it isn't passing cookies? Also, am I doing all this correctly? I want it to be optimized as best I can.

Here is my code:
from bs4 import BeautifulSoup
from lxml import html
import requests
import re
from collections import defaultdict

# Start the session
session = requests.Session()

market_items = defaultdict(dict)
compare_prices = defaultdict(dict)
# Create the payload
username = "EMAIL"
password = "PASSWORD"
authenticity_token = 0

LOGIN_URL = "https://web.simple-mmo.com/login"
URL = "https://web.simple-mmo.com/market/collectables/all"
PriceURL = "https://web.simple-mmo.com/market/all/all"




def findAveragePrice():
	session_requests = requests.session()
	key_items = market_items.items()
	t = len(key_items)
	for key in market_items:
		lookID = market_items[key]["ID"]
		print(lookID)
		payload = {
			"itemid": lookID,
			"new_page": "true",
			"_token": authenticity_token
		}
		result = session_requests.get(PriceURL, data = payload, headers = dict(referer = PriceURL))
		soup = BeautifulSoup(result.content, 'html.parser')
		print(result.text)
		#print(market_items[key]["PRICE"])
		
		pricematch = soup.find_all('div', class_='individual-item')
		print(pricematch)
		for match in pricematch:
			x = match.find('a')['onclick']
			title = x.split("retrieveMarketItem(")[1].strip().split(')')[0]
			ITEMID = title.split(",")[0].lstrip()
			RdmNum = title.split(",'")[1].lstrip().split("'")[0]
			price = title.split(" '")[1].lstrip().split("',")[0]
			player = title.split(" '")[2].lstrip().split("'")[0]
			time = title.split(" '")[3].lstrip().split("'")[0]
			
			compare_prices[key]["PRICE"]
			
			print(compare_prices)
			
			
		
		
		
		
def main():
	session_requests = requests.session()
	
	result = session_requests.get(LOGIN_URL)
	tree = html.fromstring(result.text)
	authenticity_token = list(set(tree.xpath("//input[@name='_token']/@value")))[0]
	
	payload = {
		"email": username,
		"password": password,
		"_token": authenticity_token
	}
	
	

	result = session_requests.post(LOGIN_URL, data = payload, headers = dict(referer = LOGIN_URL))

	result = session_requests.get(URL, headers = dict(referer = URL))
	tree = html.fromstring(result.content)
	bucket_names = tree.xpath("//div[@class='individual-item']/span/text()")
	
	soup = BeautifulSoup(result.content, 'html.parser')
	#print(soup)
	
	collectables = soup.find_all('div', class_='individual-item')
	

	for collectable in collectables:
		x = collectable.find('a')['onclick']
		#print(x)
		
		title = x.split("retrieveMarketItem(")[1].strip().split(')')[0]
		ITEMID = title.split(",")[0].lstrip()
		RdmNum = title.split(",'")[1].lstrip().split("'")[0]
		price = title.split(" '")[1].lstrip().split("',")[0]
		player = title.split(" '")[2].lstrip().split("'")[0]
		time = title.split(" '")[3].lstrip().split("'")[0]
		
		#print(title)
		#print("ID: " + ITEMID)
		#print("UniqueID: " + RdmNum)
		#print("PRICE: " + price)
		#print("PLAYER: " + player)
		#print("TIME: " + time)
		
		#market_items["UniqueID"] = RdmNum
		market_items[RdmNum]["ID"] = ITEMID
		market_items[RdmNum]["PRICE"] = price
		market_items[RdmNum]["SELLER"] = player
		market_items[RdmNum]["TIME"] = time
		
		#print(market_items)
		
	findAveragePrice()	
		

if __name__ == '__main__':
    main()	
	
So right now, all I'm getting returned is the login page. I think its because cookies aren't being passed through? But Im not sure how to fix that.
Reply
#2
If problem is only in session then read docs https://requests.readthedocs.io/en/maste.../advanced/
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to scraping data from dinamic site sergio21124444 2 646 Nov-08-2023, 12:43 PM
Last Post: sergio21124444
  Scraping data from table into existing dataframe vincer58 1 1,960 Jan-09-2022, 05:15 PM
Last Post: vincer58
  Web scraping data Mike_Eddy 2 2,489 Jul-03-2021, 05:49 PM
Last Post: Mike_Eddy
  Scraping lender data from Ren Ren Dai website using Python. I will pay for that 200$ Hafedh_2021 1 2,724 May-18-2021, 08:41 PM
Last Post: snippsat
  Scraping a page with log in data (security, proxies) iamaghost 0 2,103 Mar-27-2021, 02:56 PM
Last Post: iamaghost
  Scraping Data from Singapore Turf Club singaporeman 2 2,359 Dec-15-2020, 01:28 PM
Last Post: MrBitPythoner
Thumbs Up Issue facing while scraping the data from different websites in single script. Balamani 1 2,076 Oct-20-2020, 09:56 AM
Last Post: Larz60+
  POST request with form data issue web scraping hoff1022 1 2,649 Aug-14-2020, 10:25 AM
Last Post: kashcode
  Scraping a dynamic data-table in python through AJAX request filozofo 1 3,823 Aug-14-2020, 10:13 AM
Last Post: kashcode
  Web Scraping and data collection. LordDeanUK 6 4,533 Jun-29-2020, 06:04 PM
Last Post: HarleyQuin

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020