scrape books - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: scrape books (/thread-40177.html) |
scrape books - moristrudeau4 - Jun-14-2023 Hello all! Can anyone help me? I want to scrape all books of French language and export them to excel or csv. what i'm doing wrong? from bs4 import BeautifulSoup import requests import pandas as pd def get_data(url): response = requests.get(url) soup = BeautifulSoup(response.text, "lxml") books = soup.find_all("div", class_="widgetYordamListe animated fadeInUp") data = [] for book in books: item = {} item["Title"] = book.find("span", class_="badge badge-light").text[1:] item["Code"] = book.find("span", class_="context").text[1:] data.append(item) return data def export_data(data): df = pd.DataFrame(data) df.to_excel("fr.xlsx") df.to_csv("fr.csv") if __name__ == "__main__": data = get_data("http://85.105.31.188/yordambt/yordam.php?dIstekTuru=sAramaListe&aDil=fr") export_data(data) print ("Done.") RE: scrape books - snippsat - Jun-14-2023 The content is generated bye JavaScript,look at this Thread. RE: scrape books - moristrudeau4 - Jun-15-2023 thanks for the advice. I made some fixes import requests import pandas as pd import json cookies = { 'PHPSESSID': 'ec99df32ab9a03ee59aed53dda98ac69', } headers = { 'Accept': 'application/json, text/javascript, */*; q=0.01', 'Accept-Language': 'en-US,en;q=0.9,el;q=0.8,en-GB;q=0.7', 'Connection': 'keep-alive', 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8', # 'Cookie': 'PHPSESSID=ec99df32ab9a03ee59aed53dda98ac69', 'Origin': 'http://85.105.31.188', 'Referer': 'http://85.105.31.188/yordambt/yordam.php?dIstekTuru=sAramaListe&aDil=fr', 'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 7 Build/MOB30X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36', 'X-Requested-With': 'XMLHttpRequest', } data = { 'dIstekTuru': 'sAramaListe', 'aDil': 'grc', 'dKLS': '20', 'dKBS': '40', } r = requests.post('http://85.105.31.188/yordambt/php/sorgu.php', cookies=cookies, headers=headers, data=data, verify=False) # Parse the json string to a python dictionary json_str = json.loads(r.text) # The desired data is in the Data field, use pandas to construct the data framedf = pd.DataFrame(json_str["response"]["data"]) # Save to a csv file df.to_csv("fr.csv", header=False) print ("Done.") Can anyone help how to print every page in one csv? By changing " 'dKLS': '20', 'dKBS': '40'," to 40,60,80 etc I get a json with 20 book entries. How can combine them? Thanks in advance! PS. writing "by" as "bye" eg "bye Javascript" is wrong. "bye" means "goodbye". (Jun-14-2023, 09:48 PM)snippsat Wrote: The content is generated bye JavaScript,look at this Thread. |