Jan-06-2020, 08:28 AM
Hello,
I came across OmarEinea's GoodReadsScraper on Github (https://github.com/OmarEinea/GoodReadsScraper) and would like to use his scripts to scrape the English reviews of some English books on Goodreads.
Because I am very new to working with python, I don't have as much insight as I would like into what the different parts of a script mean. Therefore, I would be very grateful if someone with more experience would take a look at the scripts and the steps I took in order to tell me what I did wrong.
I downloaded the ZIP-file containing his scripts, installed the requirements and created a shelf on Goodreads containing the books of which I wanted to scrape the reviews. Because I wanted to scrape English reviews, I changed all istances of "ar" or "arabic" in his scripts to "en" and "english".
The first problem I have is that I do not really understand where I have to add information (and which information) to get what I need. OmarEinea's instruction are very brief and unfortunately do not suffice for me as a layperson to know what I need to change and in which script and which exact place I need to change it.
What I did was:
1) filled in my username and password in Browser.py
2) changed the path to "Users\xxx\Desktop\GoodReadsScraper-master\BookReviews" in Tools.py
3) created Test.py, consisting of the following code (in which "xxx" stands for the id of my shelf containing the books of which I wish to scrape the reviews):
However, when I run this via my command line, what I get is a txt-file, named "en1.txt", containing a few reviews from Harry Potter, though these books are not even present in my shelf at all, and an empty folder named "en".
Any advice of what I did wrong or how I can adapt my test-script or OmarEinea's scripts to get what I need?
I think this could be an interesting learning opportunity for me to gain more insight into scripts and scraping and would be very grateful for your advice or help!
Kind regards and I wish you a happy new year,
Ledgreve
I came across OmarEinea's GoodReadsScraper on Github (https://github.com/OmarEinea/GoodReadsScraper) and would like to use his scripts to scrape the English reviews of some English books on Goodreads.
Because I am very new to working with python, I don't have as much insight as I would like into what the different parts of a script mean. Therefore, I would be very grateful if someone with more experience would take a look at the scripts and the steps I took in order to tell me what I did wrong.
I downloaded the ZIP-file containing his scripts, installed the requirements and created a shelf on Goodreads containing the books of which I wanted to scrape the reviews. Because I wanted to scrape English reviews, I changed all istances of "ar" or "arabic" in his scripts to "en" and "english".
The first problem I have is that I do not really understand where I have to add information (and which information) to get what I need. OmarEinea's instruction are very brief and unfortunately do not suffice for me as a layperson to know what I need to change and in which script and which exact place I need to change it.
What I did was:
1) filled in my username and password in Browser.py
2) changed the path to "Users\xxx\Desktop\GoodReadsScraper-master\BookReviews" in Tools.py
3) created Test.py, consisting of the following code (in which "xxx" stands for the id of my shelf containing the books of which I wish to scrape the reviews):
1 2 3 4 5 6 7 8 9 10 11 12 13 |
from Books import Books from Reviews import Reviews from Tools import * #Scrape books reviews and write them to a file: r = Reviews( "en" ) r.output_books_reviews( "xxx" ) #Filter Reviews then combine them: delete_repeated_reviews() combine_reviews() |
Any advice of what I did wrong or how I can adapt my test-script or OmarEinea's scripts to get what I need?
I think this could be an interesting learning opportunity for me to gain more insight into scripts and scraping and would be very grateful for your advice or help!
Kind regards and I wish you a happy new year,
Ledgreve