Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
scrape books
#1
Hello all! Can anyone help me?
I want to scrape all books of French language and export them to excel or csv. what i'm doing wrong?
from bs4 import BeautifulSoup 
import requests
import pandas as pd
def get_data(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "lxml")
    books = soup.find_all("div", class_="widgetYordamListe animated fadeInUp")
    data = []
    for book in books:
        item = {}                
        item["Title"] = book.find("span",
                                  class_="badge badge-light").text[1:]
        item["Code"] = book.find("span",
                                  class_="context").text[1:]
        data.append(item) 
    return data
def export_data(data):
    df = pd.DataFrame(data)
    df.to_excel("fr.xlsx")
    df.to_csv("fr.csv")

if __name__ == "__main__":
    data = get_data("http://85.105.31.188/yordambt/yordam.php?dIstekTuru=sAramaListe&aDil=fr")
    export_data(data) 
print ("Done.")
snippsat write Jun-14-2023, 06:10 PM:
Added code tag in your post,look at BBCode on how to use.
Reply
#2
The content is generated bye JavaScript,look at this Thread.
Reply
#3
thanks for the advice.
I made some fixes

import requests
import pandas as pd
import json

cookies = {
'PHPSESSID': 'ec99df32ab9a03ee59aed53dda98ac69',
}

headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'en-US,en;q=0.9,el;q=0.8,en-GB;q=0.7',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
# 'Cookie': 'PHPSESSID=ec99df32ab9a03ee59aed53dda98ac69',
'Origin': 'http://85.105.31.188',
'Referer': 'http://85.105.31.188/yordambt/yordam.php?dIstekTuru=sAramaListe&aDil=fr',
'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 7 Build/MOB30X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
}
data = {
'dIstekTuru': 'sAramaListe',
'aDil': 'grc',
'dKLS': '20',
'dKBS': '40',
}

r = requests.post('http://85.105.31.188/yordambt/php/sorgu.php', cookies=cookies, headers=headers, data=data, verify=False)

# Parse the json string to a python dictionary
json_str = json.loads(r.text)

# The desired data is in the Data field, use pandas to construct the data frame
df = pd.DataFrame(json_str["response"]["data"])

# Save to a csv file
df.to_csv("fr.csv", header=False)
print ("Done.")

Can anyone help how to print every page in one csv?
By changing
" 'dKLS': '20',
'dKBS': '40',"
to 40,60,80 etc I get a json with 20 book entries. How can combine them?

Thanks in advance!

PS. writing "by" as "bye" eg "bye Javascript" is wrong. "bye" means "goodbye".


(Jun-14-2023, 09:48 PM)snippsat Wrote: The content is generated bye JavaScript,look at this Thread.
buran write Jun-16-2023, 02:37 AM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" butto
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020