scrape books - Printable Version

scrape books - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: scrape books (/thread-40177.html)

scrape books - moristrudeau4 - Jun-14-2023

Hello all! Can anyone help me?
I want to scrape all books of French language and export them to excel or csv. what i'm doing wrong?

from bs4 import BeautifulSoup 
import requests
import pandas as pd
def get_data(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "lxml")
    books = soup.find_all("div", class_="widgetYordamListe animated fadeInUp")
    data = []
    for book in books:
        item = {}                
        item["Title"] = book.find("span",
                                  class_="badge badge-light").text[1:]
        item["Code"] = book.find("span",
                                  class_="context").text[1:]
        data.append(item) 
    return data
def export_data(data):
    df = pd.DataFrame(data)
    df.to_excel("fr.xlsx")
    df.to_csv("fr.csv")

if __name__ == "__main__":
    data = get_data("http://85.105.31.188/yordambt/yordam.php?dIstekTuru=sAramaListe&aDil=fr")
    export_data(data) 
print ("Done.")

RE: scrape books - snippsat - Jun-14-2023

The content is generated bye JavaScript,look at this Thread.

RE: scrape books - moristrudeau4 - Jun-15-2023

thanks for the advice.
I made some fixes

import requests
import pandas as pd
import json

cookies = {
'PHPSESSID': 'ec99df32ab9a03ee59aed53dda98ac69',
}

headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'en-US,en;q=0.9,el;q=0.8,en-GB;q=0.7',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
# 'Cookie': 'PHPSESSID=ec99df32ab9a03ee59aed53dda98ac69',
'Origin': 'http://85.105.31.188',
'Referer': 'http://85.105.31.188/yordambt/yordam.php?dIstekTuru=sAramaListe&aDil=fr',
'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 7 Build/MOB30X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
}
data = {
'dIstekTuru': 'sAramaListe',
'aDil': 'grc',
'dKLS': '20',
'dKBS': '40',
}

r = requests.post('http://85.105.31.188/yordambt/php/sorgu.php', cookies=cookies, headers=headers, data=data, verify=False)

# Parse the json string to a python dictionary
json_str = json.loads(r.text)

# The desired data is in the Data field, use pandas to construct the data frame
df = pd.DataFrame(json_str["response"]["data"])

# Save to a csv file
df.to_csv("fr.csv", header=False)
print ("Done.")

Can anyone help how to print every page in one csv?
By changing
" 'dKLS': '20',
'dKBS': '40',"
to 40,60,80 etc I get a json with 20 book entries. How can combine them?

Thanks in advance!

PS. writing "by" as "bye" eg "bye Javascript" is wrong. "bye" means "goodbye".

(Jun-14-2023, 09:48 PM)snippsat Wrote: The content is generated bye JavaScript,look at this Thread.