Coding problem scraping Goodreads reviews with GoodReadsScraper

ledgreve · Jan-06-2020, 08:28 AM

Hello,

I came across OmarEinea's GoodReadsScraper on Github (https://github.com/OmarEinea/GoodReadsScraper) and would like to use his scripts to scrape the English reviews of some English books on Goodreads.

Because I am very new to working with python, I don't have as much insight as I would like into what the different parts of a script mean. Therefore, I would be very grateful if someone with more experience would take a look at the scripts and the steps I took in order to tell me what I did wrong.

I downloaded the ZIP-file containing his scripts, installed the requirements and created a shelf on Goodreads containing the books of which I wanted to scrape the reviews. Because I wanted to scrape English reviews, I changed all istances of "ar" or "arabic" in his scripts to "en" and "english".

The first problem I have is that I do not really understand where I have to add information (and which information) to get what I need. OmarEinea's instruction are very brief and unfortunately do not suffice for me as a layperson to know what I need to change and in which script and which exact place I need to change it.

What I did was:
1) filled in my username and password in Browser.py
2) changed the path to "Users\xxx\Desktop\GoodReadsScraper-master\BookReviews" in Tools.py
3) created Test.py, consisting of the following code (in which "xxx" stands for the id of my shelf containing the books of which I wish to scrape the reviews):

        
              from Books import Books
from Reviews import Reviews
from Tools import *
 
#Scrape books reviews and write them to a file:
 
r = Reviews("en")
r.output_books_reviews("xxx")
 
#Filter Reviews then combine them:
 
delete_repeated_reviews()
combine_reviews()

However, when I run this via my command line, what I get is a txt-file, named "en1.txt", containing a few reviews from Harry Potter, though these books are not even present in my shelf at all, and an empty folder named "en".
Any advice of what I did wrong or how I can adapt my test-script or OmarEinea's scripts to get what I need?

I think this could be an interesting learning opportunity for me to gain more insight into scripts and scraping and would be very grateful for your advice or help!

Kind regards and I wish you a happy new year,

Ledgreve

***micseydel*** · Jan-06-2020, 07:40 PM

(Jan-06-2020, 08:28 AM)ledgreve Wrote: However, when I run this via my command line, what I get is a txt-file, named "en1.txt", containing a few reviews from Harry Potter, though these books are not even present in my shelf at all, and an empty folder named "en".

That sounds like a bug in their code. Personally, I would abandon it, I would prefer to write something from scratch than to deal with buggy code.

That said, it's up to you what you want to do. You can dive into the code and try to figure out using a debugger or print statements why it's doing something that makes no sense. We'd be happy to help, but to be blunt I expect it to be a lot of work for a newbie, and you'll be doing most of the work.

***snippsat*** · (This post was last modified: Jan-07-2020, 03:19 AM by snippsat.)

I have tested code in virtual environment,it did not work well.
Did get some output when run Sample.py,but mostly empty stuff.
I would not try to fix this code,but write my own if doing this,but if you new to this maybe not so easy task to start with.

ledgreve · Jan-07-2020, 09:38 AM

@micseydel
@snippsat

Thank you both for your advice! I will leave the code for what it is then, since I do not have enough knowledge and experience yet to debug the code. I have found another script online (for R) that seems a lot less complicated and should be able to give me (more or less) what I need. I have already tried to run it and so far it only gave me one error message that seems solvable, so I am hopeful about it.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Strange BS4 Problem While Scraping RSS Feeds	digitalmatic7	3	5,079	Feb-15-2018, 03:18 AM Last Post: Larz60+

Coding problem scraping Goodreads reviews with GoodReadsScraper

User Panel Messages

Announcements