Python Forum

Full Version: Search the entire web
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi guys,

I'm new with Python. I want to start a project which consist to create a search engine which is supposed to look on the entire web and print the websites which contains some keywords or are responding to some variables which will be chosen by the user.
For example, let's assume that all company websites have similar pages ("About Us", "Products", "Clients", "Technologies", "Contact",...etc)
If the "Products" page contains "Table" then the web address will be printed...etc.

How do you think I should start. Which libraries I need, Do you think it's to complex...

Thanks in advance for your advises


I can already hear some of you saying that "google" is already doing this. But in my case, not really. My searches are so specific that google can't really help.
Which model cray's are in your cluster?
(Dec-15-2017, 05:25 PM)Larz60+ Wrote: [ -> ]Which model cray's are in your cluster?

Sorry I don't understand your question.
What is considered a part of the "entire web"? Not everything can be indexed, or crawled, and not everything is accessible over http.

You really only need the requests module to get a page. Finding all links in that page would be easier with beautifulsoup (the package name is bs4). And unless you have infinite time, you probably want to store an indexed version of the page in some way, using some sort of database, which would be another package.
(Dec-15-2017, 05:58 PM)DT909 Wrote: [ -> ]
(Dec-15-2017, 05:25 PM)Larz60+ Wrote: [ -> ]Which model cray's are in your cluster?

Sorry I don't understand your question. 

Yeah, we can guess that from your original question :-)
https://en.wikipedia.org/wiki/Cray