Python Forum
Installing and running a python web scraping app from github to a windows 8.1 system
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Installing and running a python web scraping app from github to a windows 8.1 system
#11
the easy way to see if you can use pip, is to goto https://pypi.org/ and search for the package by name.
If the package is there, you can use pip.
Reply
#12
There's a requirements.txt file: https://github.com/tomslee/airbnb-data-c...ements.txt

This is a file which can be generated by pip (the python package manager): https://pip.pypa.io/en/stable/reference/pip_freeze/

Installing all the dependencies is normally automatic (through a setup.py file as previously mentioned), but since this author chose not to do that, you can still do it yourself fairly easily, with: pip install -r requirements.txt
Reply
#13
Quote:you can still do it yourself fairly easily, with: pip install -r requirements.txt
Hi, tnx very much about this Wink
I didn't know this trick.
However the process stops itself when it's not able to find a suitable package version to install. It happens with anaconda

Error:
Could not find a version that satisfies the requirement anaconda-client==1.5.4 (from -r requirements.txt (line 2)) (from versions: 1.1.1, 1.2.2) No matching distribution found for anaconda-client==1.5.4 (from -r requirements. txt (line 2))


I was manually installing all the libs. The weird thing is that when I arrived at nb-anacondacloud on the requirement.txt list, I realized I need to install that with conda. I installed miniconda3 in order to perform the command

conda install -c conda-forge nb_anacondacloud

But I'm behind a proxy and I don't know how to pass proxy settings in conda commands. I read about a .condarc file but I didn't hover the hump.
Then I tried using another internet connection (proxy free) and finally I installed nb-conda requirements.
What puzzle me is that (I hope I remember the exact message appeared) the installing process downgraded(what means?) some of the precedent installed packages and in some case changed some package version.
All this process starts to resamble me Penelope's canvas Cry
Reply
#14
Hi to all Smile ,

finally I managed to install all libraries for this github project and when I type
airbnb
I get all command options
usage: airbnb.py [options]

Manage a database of Airbnb listings.

optional arguments:
-h, --help show this help message and exit
-v, --verbose write verbose (debug) output to the log file
-c config_file, --config_file config_file
explicitly set configuration file, instead of using
the default <username>.config
-asa search_area, --addsearcharea search_area
add a search area to the database. A search area is
typically a city, but may be a bigger region.
-asv search_area, --add_survey search_area
add a survey entry to the database, for search_area
-dbp, --dbping Test the database connection
-dh host_id, --displayhost host_id
display web page for host_id in browser
-dr room_id, --displayroom room_id
display web page for room_id in browser
-dsv survey_id, --delete_survey survey_id
delete a survey from the database, with its listings
-f [survey_id], --fill [survey_id]
fill details for rooms collected with -s
-lsa search_area, --listsearcharea search_area
list information about this search area from the
database
-lr room_id, --listroom room_id
list information about room_id from the database
-ls, --listsurveys list the surveys in the database
-psa search_area, --printsearcharea search_area
print the name and neighborhoods for search area
(city) from the Airbnb web site
-pr room_id, --printroom room_id
print room_id information from the Airbnb web site
-ps survey_id, --printsearch survey_id
print first page of search information for survey from
the Airbnb web site
-psn survey_id, --printsearch_by_neighborhood survey_id
print first page of search information for survey from
the Airbnb web site, by neighborhood
-psz survey_id, --printsearch_by_zipcode survey_id
print first page of search information for survey from
the Airbnb web site, by zipcode
-psb survey_id, --printsearch_by_bounding_box survey_id
print first page of search information for survey from
the Airbnb web site, by bounding_box
-s survey_id, --search survey_id
search for rooms using survey survey_id
-sn survey_id, --search_by_neighborhood survey_id
search for rooms using survey survey_id
-sb survey_id, --search_by_bounding_box survey_id
search for rooms using survey survey_id, by bounding
box
-asb search_area, --add_and_search_by_bounding_box search_area
add a survey for search_area and search , by bounding
box
-sz survey_id, --search_by_zipcode survey_id
search for rooms using survey_id, by zipcode
-V, --version show program's version number and exit
-?


...but I still have to configure postgresql database. All I have done is installing postgre sql ver 10 on my system.
Tom Slee in his help file says:
Quote:Installing and upgrading the database schema
The airbnb.py script works with a PostgreSQL database. You need to have the PostGIS extension installed. The schema is in

the file postgresql/schema_current.sql. You need to run that file to create the database tables to start with (assuming

both your user and database are named airbnb).

For example, if you use psql:

psql --user airbnb airbnb < postgresql/schema_current.sql
Preparing to run a survey
To check that you can connect to the database, run

python airbnb.py -dbp
I'm not able to find the schema_current.sql file and of course when I check the database connection with:

python airbnb.py -dbp

I get this:
Error:
ERROR Connection test failed Traceback (most recent call last): File "C:\python_37\airbnb\airbnb_config.py", line 186, in connect self.connection = psycopg2.connect(**cattr) File "C:\python_37\lib\site-packages\psycopg2\__init__.py", line 130, in conne ct conn = _connect(dsn, connection_factory=connection_factory, **kwasync) psycopg2.OperationalError: fe_sendauth: no password supplied During handling of the above exception, another exception occurred: Traceback (most recent call last): File "airbnb.py", line 142, in db_ping conn = config.connect() File "C:\python_37\airbnb\airbnb_config.py", line 190, in connect logger.error(pgoe.message) AttributeError: 'OperationalError' object has no attribute 'message'
Could someone help me in configuring psql database? Huh

p.s. here is my airbnb config file if someone wants to check it:
https://drive.google.com/open?id=1_psMrQ...r8_kAvobyh

Quote:Could someone help me in configuring psql database?

In particular, in the gitub folder there is a postgresql folder.
Do I have to copy these files in my postgresql installation? and if so, where?

And where do I have to run this code?:
psql --user airbnb airbnb < postgresql/schema_current.sql
Reply
#15
airbnb is not an executable program, it is a 'package' that you 'import' into your code
and write code around to 'call' methods from the package library (or in this case an api).
As I pointed out to you already, go to the github page: https://github.com/nderkach/airbnb-python
and scroll down to the section named 'Usage'
This will show you how to use the package!
Reply
#16
Hi to all,

I manage to run the python web scraping package, even the postgresql part, but whe I run it I get this msg:
Error:
WARNING HTTP status 400 from web site: IP address blocked. Waiting 1.0 minutes.
Even after other attempts and behind a vpn I get the same msg.
Is there something I can do, perhaps editing the content of the following file and the user agent list?

https://drive.google.com/open?id=1jHmu8k...mBMqxfYOfv
Pls don't be shy to help a little Larz60+ Wink

William bowed. “You are wise also when you are severe. It shall be as you wish.”
“If ever I were wise, it would be because I know how to be severe,” the abbot answered.
...from Umberto Eco The Name of the Rose.
Reply
#17
Some sites require you to provide a header (user-agent) in your request, and may refuse your connection otherwise.
you can find your user-agent by googleing 'my user agent', it will appear at top of page,
then set up in dictionary like:
user_agent = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:60.0) '
    'Gecko/20100101 Firefox/60.0 AppleWebKit/537.36 (KHTML, '
    'like Gecko)  Chrome/68.0.3440.75 Safari/537.36'}
don't know what you are using to get page, but with requests it would be:
response = requests.get(url, headers=user_agent)
Also make sure you are sleeping for a few seconds after each request as hammering on a site is another reason for refusal.
from time import sleep
...
response = requests.get(url, headers=user_agent)
sleep(2)
Reply
#18
Hi and tnx for Your answer. I checked both user agent and sleep settings and I think they are ok (sleep time set to 5 sec). When I run the package I get a "WARNING No proxy_list in airbnb.config: not using proxies" (I'm not behind a corporate proxy when I check the package).
Tom Slee says this about proxy in the help file:
Quote:Sometimes the Airbnb site refuses repeated requests. I run the script using a number of proxy IP addresses to avoid being turned away, and that costs money. I am afraid that I cannot help in finding or working with proxy IP services.
I have a vpn account and I tried to run the package behind 3 or four different ip addresses (one at a time) with the same result. One thing I would like to understand is this. When someone buys a proxy service to run a crawler (I've seen a bunch of offers on internet), is that a different service with respect to the standard vpn that anonymize the navigation?
Is there something other I can try apart from buying proxy service? However the next thing I'll do is to try to run the package from another pc and internet connection.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Installing Github-programs AudunNilsen 1 223 Mar-22-2024, 04:28 PM
Last Post: deanhystad
  Advice for installing python app from github ? Adi44 6 1,214 Aug-07-2023, 06:34 PM
Last Post: snippsat
  Python script running under windows over nssm.exe JaroslavZ 0 716 May-12-2023, 09:22 AM
Last Post: JaroslavZ
  Installing Qt for Python. (On Windows 10) davediamond 22 4,558 Apr-18-2022, 06:50 AM
Last Post: ndc85430
  batch file for running python scipt in Windows shell MaartenRo 2 1,896 Jan-21-2022, 02:36 PM
Last Post: MaartenRo
  Installing auto-sklearn on Windows 10 Led_Zeppelin 1 2,671 Apr-15-2021, 08:02 PM
Last Post: bowlofred
  Need help installing infoblox-client on Windows 10 dazmac10 1 2,541 Mar-07-2021, 10:57 PM
Last Post: snippsat
  How to link Sublime Text 3 Build system to Python 3.9 Using Windows 10 Fanman001 2 4,611 Mar-04-2021, 03:09 PM
Last Post: martpogs
  Running python scripts from github etc pacmyc 7 3,721 Mar-03-2021, 10:26 PM
Last Post: pacmyc
  Difference between os.system("clear") and os.system("cls") chmsrohit 7 16,638 Jan-11-2021, 06:30 PM
Last Post: ykumar34

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020