Dec-21-2017, 03:35 PM
Hey, every one. I've been writing these tutorials, mainly for my self to retain knowledge. I thought they might help some one else out. Please advise if theirs any problems,errors, etc..
Happy Holidays
TOPIC MI PYTHON : KW KEY WORD SEARCH BOT WITH PYTHON AND SELENIUM
Hey everybody been having fun working with Selenium in Python . Now lets start putting it all together. Unfortunately some web sites don't want you're bots doing weird stuff to them. So lets incorporate our web scrapper into the Anon redirect function we created earlier. That way at least our real IP shouldn't be banned.
This is a useful tool for SEO optimization.
First lets import everything we need.
Below we define our class. Now, apparently a constructor is not called a constructor in python? For ease of my own understanding I'm just going to call the construction of a class a constructor for now. I could be way off with my terminology.
1.) We first construct the class with __init__ method.
2.) Then define the class wide variable "browser"
3.) "Package" variable is a blank list upon instantiation not used in this tutorial.
4.) Search engine is the search on we will be using to get our data
5.) Kw_list is our list of keywords.
Now we define the main function for our class.
1.) Define the main function self as an object reference to the object.
2.) Get the webdriver.Chrome() as the browser
3.) Find and clear the element "maintextfield"
4.) Print the search engine url in the field.
5.) Hit enter key
6.) Wait 2 seconds
Next we are going to define our main loop to loop through our keyword list.
1.) Make the for loop.
2.) Print the key word currently being posted in the terminal.
3.)Wait 3 seconds.
4.).Locate element "q" and clear the field.
5.) Send the keyword from key word list.
6.) Press enter.
7.) Wait 3 seconds.
Next we are going to print the results of the search into the terminal
2.) b_algo equals the element with the description of the first item
3.) b_algo_text equals the web elements text encoded in utf-8
4.) Print both hit count and description web objects as text in the terminal.
Now lets save those same results to our local harddrive.
2.) Write the current keyword with a line break
3.) Write the hit count with line break.
4.) Write the description as a string of text encoded in utf-8. I had problems with writing chinese characters or other encoding types. I found that this method works for this instance.
5.) Close the file until it loops through again.
Ok, almost done with our first key word scrapping; bot, script, daemon, whatever you feel like calling it. Lets define our key word list as a list variable. Next we need to Instantiate our class.
Instantiate the class object bot with Chrome as the web driver, Search engine url , key_words list , package list unused in this tutorial.
Call the main function of bot object
Happy Holidays
TOPIC MI PYTHON : KW KEY WORD SEARCH BOT WITH PYTHON AND SELENIUM
Hey everybody been having fun working with Selenium in Python . Now lets start putting it all together. Unfortunately some web sites don't want you're bots doing weird stuff to them. So lets incorporate our web scrapper into the Anon redirect function we created earlier. That way at least our real IP shouldn't be banned.
This is a useful tool for SEO optimization.
First lets import everything we need.
import time import requests from selenium import webdriver from selenium.webdriver.common.keys import KeysNow instead of using functions individually by themselves. Let's make a class object as a container to organize all our functions into a more coherent data structure.
Below we define our class. Now, apparently a constructor is not called a constructor in python? For ease of my own understanding I'm just going to call the construction of a class a constructor for now. I could be way off with my terminology.
class KW_SEARCH_BOT(object): # CONSTRUCTOR(?) def __init__(self,browser,search_engine_url,kw_list,package): self.browser = browser # TEXT OF SCRAPE PAGE UNUSED UPON INITIALIZATION self.package = package # CLASS WIDE VAR FOR SEARCH ENGINE TO USE UPON INITIALIZATION self.search_engine_url = search_engine_url # CLASS WIDE VAR FOR KEY WORD LIST UPON INITIALIZATION self.kw_list = kw_listI called the class KW_SEARCH_BOT. You can call it whatever you want.
1.) We first construct the class with __init__ method.
2.) Then define the class wide variable "browser"
3.) "Package" variable is a blank list upon instantiation not used in this tutorial.
4.) Search engine is the search on we will be using to get our data
5.) Kw_list is our list of keywords.
Now we define the main function for our class.
def main(self): ################ REDIRECT THROUGH KPROXY ##################### print("OPEN BROWSER") self.browser.get("http://www.kproxy.com") # FIND ELEMENT print("FIND ELEMENT BY ID MAINTEXTFIELD AND SEND SEARCH ENGINE URL") elem = self.browser.find_element_by_id("maintextfield").clear() elem = self.browser.find_element_by_id("maintextfield").send_keys(self.search_engine_url) elem = self.browser.find_element_by_id("maintextfield").send_keys(Keys.ENTER) # WAIT 2 SECONDS time.sleep(2)The above code defines the main function of our class. The object is going to redirect through Kproxy and insert the search engine url of our preferred search engine.
1.) Define the main function self as an object reference to the object.
2.) Get the webdriver.Chrome() as the browser
3.) Find and clear the element "maintextfield"
4.) Print the search engine url in the field.
5.) Hit enter key
6.) Wait 2 seconds
Next we are going to define our main loop to loop through our keyword list.
### MAIN LOOP # INJECT KEY WORD LIST # LOOPS THROUGH KW_LIST AS KW for kw in self.kw_list: print("*" * 30) print(kw) print("*" * 30) print("INJECTING KEYWORD PAYLOAD") print("-" * 30) # WAIT 3 SECONDS time.sleep(3) # LOCATE ELEMENT AND CLEAR elem = self.browser.find_element_by_name("q").clear() # LOCATE ELEMENT AND SEND KEYS : KW VARIABLE elem = self.browser.find_element_by_name("q").send_keys(kw) # PRESS ENTER elem = self.browser.find_element_by_name("q").send_keys(Keys.ENTER) # WAIT 3 SECONDS time.sleep(3)For the keyword(kw) in the keyword list(kw_list) we send that keyword to the search engine url through the anonimizer.
1.) Make the for loop.
2.) Print the key word currently being posted in the terminal.
3.)Wait 3 seconds.
4.).Locate element "q" and clear the field.
5.) Send the keyword from key word list.
6.) Press enter.
7.) Wait 3 seconds.
Next we are going to print the results of the search into the terminal
# PRINT IN TERMINAL # SEARCH RESULTS OF KEY WORD HITS hit_count = self.browser.find_element_by_class_name("sb_count") # SEARCH RESULTS OF SNIPPET #snippet = self.browser.find_element_by_id("b_results") # SEARCH RESULTS OF B_ALGO b_algo = self.browser.find_element_by_class_name("b_algo") # TURN THEWEB OBJECT INTO TEXT AND ENCODE IN UTF-8 b_algo_text = b_algo.text.encode("utf-8") print(b_algo_text) print(hit_count.text)1.) hit_count equals the element with the hit count.
2.) b_algo equals the element with the description of the first item
3.) b_algo_text equals the web elements text encoded in utf-8
4.) Print both hit count and description web objects as text in the terminal.
Now lets save those same results to our local harddrive.
# PRINT TO LOCAL FILE # CREATE AND OPEN LOCAL FILE local_file = open( "key_word_results.txt" , "a") # WRITE TO LOCAL FILE KW VARIABLE local_file.write(",\n " + kw) # WRITE TO LOCAL FILE HIT COUNT local_file.write(",\n " + hit_count.text) # WRITE TO LOCAL FILE DESCRIPTION AS ENCODED STRING local_file.write(",\n " + str(b_algo.text.encode("utf-8"))) local_file.write("\n " + "*" * 30) local_file.close()1.) First we create and open the local file to edit as "key_word_results.txt","a")
2.) Write the current keyword with a line break
3.) Write the hit count with line break.
4.) Write the description as a string of text encoded in utf-8. I had problems with writing chinese characters or other encoding types. I found that this method works for this instance.
5.) Close the file until it loops through again.
Ok, almost done with our first key word scrapping; bot, script, daemon, whatever you feel like calling it. Lets define our key word list as a list variable. Next we need to Instantiate our class.
# DEFINE KEW_WORDS LIST TO INPUT key_words = ["search 1","search 2","search 3"] # INSTANTIATE KW_POST_BOT AS bot bot = KW_POST_BOT(webdriver.Chrome(),"http://www.bing.com",key_words,[])Now, the list variable "key_words" has our search words in it. Put the key words you are interested in, in the list separated by a comma.
Instantiate the class object bot with Chrome as the web driver, Search engine url , key_words list , package list unused in this tutorial.
Call the main function of bot object
bot.main()Time to run our program and see what happens. You should see a chrome browser window open up and go to the anonimzer url. Then the key words should be injected and the results saved both in terminal and locally.
print("*" * 30) print("KW SEARCH BOT MIPython") print("http://www.mipython.com") print("*" * 30) import time import requests from selenium import webdriver from selenium.webdriver.common.keys import Keys # DEFINE CLASS INHERIT FROM OBJECT class KW_POST_BOT(object): # CONSTRUCTOR(?) def __init__(self,browser,search_engine_url,kw_list,package): self.browser = browser # TEXT OF SCRAPE PAGE UNUSED UPON INITIALIZATION self.package = package # CLASS WIDE VAR FOR SEARCH ENGINE TO USE UPON INITIALIZATION self.search_engine_url = search_engine_url # CLASS WIDE VAR FOR KEY WORD LIST UPON INITIALIZATION self.kw_list = kw_list def main(self): ################ REDIRECT THROUGH KPROXY ##################### print("OPEN BROWSER") self.browser.get("http://www.kproxy.com") # FIND ELEMENT print("FIND ELEMENT BY ID MAINTEXTFIELD AND SEND SEARCH ENGINE URL") elem = self.browser.find_element_by_id("maintextfield").clear() elem = self.browser.find_element_by_id("maintextfield").send_keys(self.search_engine_url) elem = self.browser.find_element_by_id("maintextfield").send_keys(Keys.ENTER) # WAIT 2 SECONDS time.sleep(2) ### MAIN LOOP # INJECT KEY WORD LIST # LOOPS THROUGH KW_LIST AS KW for kw in self.kw_list: print("*" * 30) print(kw) print("*" * 30) print("INJECTING KEYWORD PAYLOAD") print("-" * 30) # WAIT 3 SECONDS time.sleep(3) # LOCATE ELEMENT AND CLEAR elem = self.browser.find_element_by_name("q").clear() # LOCATE ELEMENT AND SEND KEYS : KW VARIABLE elem = self.browser.find_element_by_name("q").send_keys(kw) # PRESS ENTER elem = self.browser.find_element_by_name("q").send_keys(Keys.ENTER) # WAIT 3 SECONDS time.sleep(3) # PRINT IN TERMINAL # SEARCH RESULTS OF KEY WORD HITS hit_count = self.browser.find_element_by_class_name("sb_count") # SEARCH RESULTS OF SNIPPET #snippet = self.browser.find_element_by_id("b_results") # SEARCH RESULTS OF B_ALGO b_algo = self.browser.find_element_by_class_name("b_algo") # TURN THEWEB OBJECT INTO TEXT AND ENCODE IN UTF-8 b_algo_text = b_algo.text.encode("utf-8") print(b_algo_text) print(hit_count.text) # PRINT TO LOCAL FILE # CREATE AND OPEN LOCAL FILE local_file = open( "date" + "_key_word_results.txt" , "a") # WRITE TO LOCAL FILE KW VARIABLE local_file.write(",\n " + kw) # WRITE TO LOCAL FILE HIT COUNT local_file.write(",\n " + hit_count.text) # WRITE TO LOCAL FILE DESCRIPTION AS ENCODED STRING local_file.write(",\n " + str(b_algo.text.encode("utf-8"))) local_file.write("\n " + "*" * 30) local_file.close() # DEFINE KEW_WORDS LIST TO INPUT key_words = ["sample1","sample2","sample3"] # INSTANTIATE KW_POST_BOT AS bot bot = KW_POST_BOT(webdriver.Chrome(),"http://www.bing.com",key_words,[]) bot.main()