Python Forum

Full Version: scrape books
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello all! Can anyone help me?
I want to scrape all books of French language and export them to excel or csv. what i'm doing wrong?
from bs4 import BeautifulSoup 
import requests
import pandas as pd
def get_data(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "lxml")
    books = soup.find_all("div", class_="widgetYordamListe animated fadeInUp")
    data = []
    for book in books:
        item = {}                
        item["Title"] = book.find("span",
                                  class_="badge badge-light").text[1:]
        item["Code"] = book.find("span",
                                  class_="context").text[1:]
        data.append(item) 
    return data
def export_data(data):
    df = pd.DataFrame(data)
    df.to_excel("fr.xlsx")
    df.to_csv("fr.csv")

if __name__ == "__main__":
    data = get_data("http://85.105.31.188/yordambt/yordam.php?dIstekTuru=sAramaListe&aDil=fr")
    export_data(data) 
print ("Done.")
The content is generated bye JavaScript,look at this Thread.
thanks for the advice.
I made some fixes

import requests
import pandas as pd
import json

cookies = {
'PHPSESSID': 'ec99df32ab9a03ee59aed53dda98ac69',
}

headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'en-US,en;q=0.9,el;q=0.8,en-GB;q=0.7',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
# 'Cookie': 'PHPSESSID=ec99df32ab9a03ee59aed53dda98ac69',
'Origin': 'http://85.105.31.188',
'Referer': 'http://85.105.31.188/yordambt/yordam.php?dIstekTuru=sAramaListe&aDil=fr',
'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 7 Build/MOB30X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
}
data = {
'dIstekTuru': 'sAramaListe',
'aDil': 'grc',
'dKLS': '20',
'dKBS': '40',
}

r = requests.post('http://85.105.31.188/yordambt/php/sorgu.php', cookies=cookies, headers=headers, data=data, verify=False)

# Parse the json string to a python dictionary
json_str = json.loads(r.text)

# The desired data is in the Data field, use pandas to construct the data frame
df = pd.DataFrame(json_str["response"]["data"])

# Save to a csv file
df.to_csv("fr.csv", header=False)
print ("Done.")

Can anyone help how to print every page in one csv?
By changing
" 'dKLS': '20',
'dKBS': '40',"
to 40,60,80 etc I get a json with 20 book entries. How can combine them?

Thanks in advance!

PS. writing "by" as "bye" eg "bye Javascript" is wrong. "bye" means "goodbye".


(Jun-14-2023, 09:48 PM)snippsat Wrote: [ -> ]The content is generated bye JavaScript,look at this Thread.