Python Forum
Catch all cookies from any website - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Catch all cookies from any website (/thread-20499.html)



Catch all cookies from any website - Chuky - Aug-14-2019

Hey,

first of all sorry for my bad english but i trials it.

I try to create a cookie scanner that get all cookies (first and third party) from a website.
My first idea was to open a website with the selenium chromedriver and read the sqlite3 database which the chrome browser creates. I would do that because selenium itself can't read third party cookies. My problem is that the database is sometimes empty and sometimes not. If i open the database with python i get actually always an empty result but sometimes i open the database after my code is finished with a sqlite browser i get the cookies. I don't know why...

For example i open a mozilla website.

from selenium import webdriver
import os, shutil, sqlite3


browser_list_place = 0
browser_list = []
profiles_folder = "profiles"


def getcookies(url):
    if os.path.isdir(profiles_folder):
        shutil.rmtree(profiles_folder)

    co = webdriver.ChromeOptions()
    co.add_argument("--no-sandbox")
    co.add_argument("--user-data-dir=" + profiles_folder + "/" + str(browser_list_place))

    browser_list.append(webdriver.Chrome('D:\crawler\chromedriver.exe', options=co))
    browser_list[browser_list_place].set_page_load_timeout(30)

    browser_list[browser_list_place].get(url)

    #browser_list[browser_list_place].quit()

    for folder in range(0, browser_list_place + 1):
        con = sqlite3.connect(profiles_folder + "/" + str(folder) + "/Default/Cookies")
        cur = con.cursor()
        cur.execute("SELECT * FROM cookies")
        rows = cur.fetchall()
        for row in rows:
            print(row)


getcookies('https://developer.mozilla.org/de/')
In the chrome browser i can see 7 cookies.
[Image: Aq4p6.png]


Now after much problems i asking me if there a more efficient way to get all cookies from a website (first and third party) without selenium?

Thanks for your help!


RE: Catch all cookies from any website - Chuky - Aug-18-2019

Can't anybody help me? :/


RE: Catch all cookies from any website - wavic - Aug-18-2019

Today's web sites are using a lot of JS so the short answer is - no. You can't catch all cookies from a web site without selenium.


RE: Catch all cookies from any website - Chuky - Aug-18-2019

Okay thanks, that is a good to know.

Do you know why my database is sometimes empty although i can see the cookies in browser? It's very curious...


RE: Catch all cookies from any website - snippsat - Aug-18-2019

(Aug-14-2019, 01:14 PM)Chuky Wrote: If i open the database with python i get actually always an empty result but sometimes i open the database after my code is finished with a sqlite browser i get the cookies. I don't know why...
Try give user-data-dir a real path to a drive,and don't use single \ in path use r or turn it around /.
Test.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import time
import sqlite3

#--| Setup
options = Options()
options.add_argument("--headless")
#options.add_argument('--disable-gpu')
#options.add_argument('--log-level=3')
options.add_argument(r"user-data-dir=C:\selenium")
browser = webdriver.Chrome(executable_path=r'chromedriver.exe', options=options)
#--| Parse or automation
browser.get('https://developer.mozilla.org/de/')
browser.implicitly_wait(2)
#print(browser.get_cookies())

#--| DB
con = sqlite3.connect(r'C:\selenium\Default\Cookies')
cur = con.cursor()
cur.execute("SELECT * FROM cookies")
rows = cur.fetchall()
#print(len(rows))
print(f'Live get {len(browser.get_cookies())} cookies,From DB get {len(rows)} cookies')
Output:
Live get 5 cookies,From DB get 7 cookies



RE: Catch all cookies from any website - Chuky - Aug-19-2019

Why it not works for me :(

My result is:
Quote:Live get 5 cookies,From DB get 0 cookies

And if i open the cookie database with a sqlite3 browser, it shows 5 cookies:
[Image: cookies.png]


RE: Catch all cookies from any website - snippsat - Aug-19-2019

Try copy my code,make new folder for user-data eg C:\selenuim.
Then get DB for this folder as shown.
Two 3-party cookies i get in DB is from Google.
[Image: hA5xCW.jpg]


RE: Catch all cookies from any website - Chuky - Aug-20-2019

I try it. I modfied the code because the script is believed to run on a debian server.
Now it looks like:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import time
import sqlite3
 
#--| Setup
options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--disable-cookie-encryption")
options.add_argument(r"user-data-dir=/bin/crawler/test")
browser = webdriver.Chrome(executable_path=r'/bin/crawler/chromedriver', options=options)
#--| Parse or automation
browser.get('https://developer.mozilla.org/de/')
browser.implicitly_wait(2)
 
#--| DB
con = sqlite3.connect(r'/bin/crawler/test/Default/Cookies')
cur = con.cursor()
cur.execute("SELECT * FROM cookies")
rows = cur.fetchall()

for row in rows:
	print(row)
And my result is 5 cookies...
(13210772067650212, u'.mozilla.org', u'_ga', u'GA1.2.1914013376.1566298468', u'/', 13273844067000000, 0, 0, 13210772067650212, 1, 1, 1, <read-write buffer ptr 0x7f0ba68001f0, size 0 at 0x7f0ba68001b0>, -1)
(13210772067654690, u'.mozilla.org', u'_gat', u'1', u'/', 13210772127000000, 0, 0, 13210772067654690, 1, 1, 1, <read-write buffer ptr 0x7f0ba68001b0, size 0 at 0x7f0ba6800170>, -1)
(13210772067651648, u'.mozilla.org', u'_gid', u'GA1.2.1265142963.1566298468', u'/', 13210858467000000, 0, 0, 13210772067651648, 1, 1, 1, <read-write buffer ptr 0x7f0ba6800270, size 0 at 0x7f0ba6800230>, -1)
(13210772059377725, u'.developer.mozilla.org', u'dwf_sg_task_completion', u'False', u'/', 13213364059377725, 1, 0, 13210772059377725, 1, 1, 1, <read-write buffer ptr 0x7f0ba68002b0, size 0 at 0x7f0ba6800270>, -1)
(13210772059500023, u'developer.mozilla.org', u'lux_uid', u'156629845949992799', u'/', 13210773859500023, 0, 0, 13210772059500023, 1, 1, 1, <read-write buffer ptr 0x7f0ba68002f0, size 0 at 0x7f0ba68002b0>, -1)



RE: Catch all cookies from any website - snippsat - Aug-20-2019

I tested on Linux Mint 19 and to get 5 there.
Not sure why there is difference between with same code on Windows and Linux.