Catch all cookies from any website - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Catch all cookies from any website (/thread-20499.html) |
Catch all cookies from any website - Chuky - Aug-14-2019 Hey, first of all sorry for my bad english but i trials it. I try to create a cookie scanner that get all cookies (first and third party) from a website. My first idea was to open a website with the selenium chromedriver and read the sqlite3 database which the chrome browser creates. I would do that because selenium itself can't read third party cookies. My problem is that the database is sometimes empty and sometimes not. If i open the database with python i get actually always an empty result but sometimes i open the database after my code is finished with a sqlite browser i get the cookies. I don't know why... For example i open a mozilla website. from selenium import webdriver import os, shutil, sqlite3 browser_list_place = 0 browser_list = [] profiles_folder = "profiles" def getcookies(url): if os.path.isdir(profiles_folder): shutil.rmtree(profiles_folder) co = webdriver.ChromeOptions() co.add_argument("--no-sandbox") co.add_argument("--user-data-dir=" + profiles_folder + "/" + str(browser_list_place)) browser_list.append(webdriver.Chrome('D:\crawler\chromedriver.exe', options=co)) browser_list[browser_list_place].set_page_load_timeout(30) browser_list[browser_list_place].get(url) #browser_list[browser_list_place].quit() for folder in range(0, browser_list_place + 1): con = sqlite3.connect(profiles_folder + "/" + str(folder) + "/Default/Cookies") cur = con.cursor() cur.execute("SELECT * FROM cookies") rows = cur.fetchall() for row in rows: print(row) getcookies('https://developer.mozilla.org/de/')In the chrome browser i can see 7 cookies. Now after much problems i asking me if there a more efficient way to get all cookies from a website (first and third party) without selenium? Thanks for your help! RE: Catch all cookies from any website - Chuky - Aug-18-2019 Can't anybody help me? :/ RE: Catch all cookies from any website - wavic - Aug-18-2019 Today's web sites are using a lot of JS so the short answer is - no. You can't catch all cookies from a web site without selenium. RE: Catch all cookies from any website - Chuky - Aug-18-2019 Okay thanks, that is a good to know. Do you know why my database is sometimes empty although i can see the cookies in browser? It's very curious... RE: Catch all cookies from any website - snippsat - Aug-18-2019 (Aug-14-2019, 01:14 PM)Chuky Wrote: If i open the database with python i get actually always an empty result but sometimes i open the database after my code is finished with a sqlite browser i get the cookies. I don't know why...Try give user-data-dir a real path to a drive,and don't use single \ in path use r or turn it around / .Test. from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.keys import Keys import time import sqlite3 #--| Setup options = Options() options.add_argument("--headless") #options.add_argument('--disable-gpu') #options.add_argument('--log-level=3') options.add_argument(r"user-data-dir=C:\selenium") browser = webdriver.Chrome(executable_path=r'chromedriver.exe', options=options) #--| Parse or automation browser.get('https://developer.mozilla.org/de/') browser.implicitly_wait(2) #print(browser.get_cookies()) #--| DB con = sqlite3.connect(r'C:\selenium\Default\Cookies') cur = con.cursor() cur.execute("SELECT * FROM cookies") rows = cur.fetchall() #print(len(rows)) print(f'Live get {len(browser.get_cookies())} cookies,From DB get {len(rows)} cookies')
RE: Catch all cookies from any website - Chuky - Aug-19-2019 Why it not works for me :( My result is: Quote:Live get 5 cookies,From DB get 0 cookies And if i open the cookie database with a sqlite3 browser, it shows 5 cookies: RE: Catch all cookies from any website - snippsat - Aug-19-2019 Try copy my code,make new folder for user-data eg C:\selenuim . Then get DB for this folder as shown. Two 3-party cookies i get in DB is from Google. RE: Catch all cookies from any website - Chuky - Aug-20-2019 I try it. I modfied the code because the script is believed to run on a debian server. Now it looks like: from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.keys import Keys import time import sqlite3 #--| Setup options = Options() options.add_argument('--headless') options.add_argument('--no-sandbox') options.add_argument('--disable-dev-shm-usage') options.add_argument("--disable-cookie-encryption") options.add_argument(r"user-data-dir=/bin/crawler/test") browser = webdriver.Chrome(executable_path=r'/bin/crawler/chromedriver', options=options) #--| Parse or automation browser.get('https://developer.mozilla.org/de/') browser.implicitly_wait(2) #--| DB con = sqlite3.connect(r'/bin/crawler/test/Default/Cookies') cur = con.cursor() cur.execute("SELECT * FROM cookies") rows = cur.fetchall() for row in rows: print(row)And my result is 5 cookies... (13210772067650212, u'.mozilla.org', u'_ga', u'GA1.2.1914013376.1566298468', u'/', 13273844067000000, 0, 0, 13210772067650212, 1, 1, 1, <read-write buffer ptr 0x7f0ba68001f0, size 0 at 0x7f0ba68001b0>, -1) (13210772067654690, u'.mozilla.org', u'_gat', u'1', u'/', 13210772127000000, 0, 0, 13210772067654690, 1, 1, 1, <read-write buffer ptr 0x7f0ba68001b0, size 0 at 0x7f0ba6800170>, -1) (13210772067651648, u'.mozilla.org', u'_gid', u'GA1.2.1265142963.1566298468', u'/', 13210858467000000, 0, 0, 13210772067651648, 1, 1, 1, <read-write buffer ptr 0x7f0ba6800270, size 0 at 0x7f0ba6800230>, -1) (13210772059377725, u'.developer.mozilla.org', u'dwf_sg_task_completion', u'False', u'/', 13213364059377725, 1, 0, 13210772059377725, 1, 1, 1, <read-write buffer ptr 0x7f0ba68002b0, size 0 at 0x7f0ba6800270>, -1) (13210772059500023, u'developer.mozilla.org', u'lux_uid', u'156629845949992799', u'/', 13210773859500023, 0, 0, 13210772059500023, 1, 1, 1, <read-write buffer ptr 0x7f0ba68002f0, size 0 at 0x7f0ba68002b0>, -1) RE: Catch all cookies from any website - snippsat - Aug-20-2019 I tested on Linux Mint 19 and to get 5 there. Not sure why there is difference between with same code on Windows and Linux. |