Securing King James Bible (KJV) + King James Bible 1611 (KJV1611) in MariaDB/MySQL - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Securing King James Bible (KJV) + King James Bible 1611 (KJV1611) in MariaDB/MySQL (/thread-36098.html) |
Securing King James Bible (KJV) + King James Bible 1611 (KJV1611) in MariaDB/MySQL - BrandonKastning - Jan-16-2022 Securing King James Bible (KJV) + King James Bible 1611 (KJV1611) in MariaDB/MySQL I have the need to have King James Bible (KJV) in .SQL (which I have secured from bibleprotector.com); I had to use wayback from archive.org, and I was able to download the .SQL The only King James Bible 1611 I have found online that is as authoritative as you can find (at least I felt this way until Ads appeared all over it one day). Domain: kingjamesbibleonline.org They have both KJV & KJV1611 (Very nice website [extremely comprehensive] and the Scripture is Accurate as far as I have tested). I used WGET to download an offline copy. Now I need to parse the html using bs4 and learning python loops / scripts to parse and payload insert into MariaDB 10.3.x / 10.4.x. I used the following WGET command: (wget.start.txt) wget \ --random-wait \ --recursive \ --no-clobber \ --page-requisites \ --html-extension \ --convert-links \ --no-check-certificate \ --output-file=logfile \ --user-agent="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20160101 Firefox/66.0 | SHARPEN YOUR SWORD MINISTRIES - WGET - US1, USIV, USXIV, USARTVI, THIS CONSTITUTION 1787 | Wash 1,1 | Wash 1,2 | Wash 1,7 | Wash 1,11 | Wash 1,12 | Wash 1,29 | Wash 1,30 | Wash 1,32 Nov. 11 1889" "www.kingjamesbibleonline.org" --domains www.kingjamesbibleonline.org \ --no-parent \ https://www.kingjamesbibleonline.orgThe nice thing about doing WGET's this way; I learned that if you open up a terminal; you can divide horizontally and on the top you can paste the WGET (and modify the variables for different offline download projects) and on the bottom you can use the following to watch the download using WGET in real-time from it's log file specified. tail -f logfile I found it incredibly helpful when learning to use WGET over the last several years. Now that I am ready to start parsing this to MariaDB. I used the following information & source: Source/Tutorials: https://unix.stackexchange.com/questions/117605/ls-command-output-to-file script.sh: #!/bin/sh ls -lrt >> files.log This an example of the my output from the working HTML directory / file structure (files.log) [My Offline Copy of kingjamesbibleonline.org]: rwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 1611_1-Kings-7-15 drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 Discussion-Thread-49691 drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 Isaiah-23-1_23-7 drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 Jeremiah-42-22 -rw-r--r-- 1 brandon brandon 58571 Nov 4 22:37 search.php?q=Fathers+And+Children&page=22&order=0&bsec="e=.html -rw-r--r-- 1 brandon brandon 70626 Nov 4 22:37 search.php?q=Women+For+Pastors&page=76&order=0&bsec="e=.html -rw-r--r-- 1 brandon brandon 52271 Nov 4 22:37 search.php?word=Animals+In+Captivity&order=1&bsec="e=Animals In Captivity.html drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 Discern_Gods_Will drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 1611_2-Kings-14-3 drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 1611_Ecclesiastes-Chapter-9 drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 Jude-1-8_1-10 drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 Bible-Verses-About-Theft_KJV -rw-r--r-- 1 brandon brandon 44385 Nov 4 22:37 search.php?q=Knowledge&bsec=O&order=0.html -rw-r--r-- 1 brandon brandon 54752 Nov 4 22:37 search.php?q=Your+Neighbor&page=27&order=0&bsec="e=.html -rw-r--r-- 1 brandon brandon 83136 Nov 4 22:37 search.php?q=Ask+And+You+Shall+Receive&page=153&order=0&bsec=Z"e=.html drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 Discussion-Thread-149262 drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 Psalms-Chapter-93_Original-1611-KJV drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 bible-verses-like_Genesis-50-26 -rw-r--r-- 1 brandon brandon 52456 Nov 4 22:37 search.php?q=Willful+Sin&page=13&order=0&bsec="e=.html -rw-r--r-- 1 brandon brandon 63610 Nov 4 22:37 search.php?q=Obeying+God&page=125&order=0&bsec="e=.html drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 Discussion-Thread-150082 -rw-r--r-- 1 brandon brandon 57947 Nov 4 22:37 search.php?q=The+Second+Coming+Of+Jesus&page=33&order=0&bsec="e=.html drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 Exodus-21-9 -rw-r--r-- 1 brandon brandon 57152 Nov 4 22:37 search.php?q=Trusting+Other+People&page=23&order=0&bsec="e=.html drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 1611_Job-30-22 drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 Bible-Verses-About-Church-Planting -rw-r--r-- 1 brandon brandon 49734 Nov 4 22:37 search.php?q=No+One+Is+Perfect&bsec=O&order=0.html drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 Isaiah-59-9_59-11 drwxr-xr-x 2 brandon brandon 4096 Nov 4 22:37 Judges-6-20I would like to somehow take files.log and output all the "1611" Folders and Filenames to a different .log or .txt. How do I go about doing this? Thank you everyone for this forum! Best Regards, Brandon Kastning RE: Securing King James Bible (KJV) + King James Bible 1611 (KJV1611) in MariaDB/MySQL - BrandonKastning - Jan-17-2022 Part 1: Accomplished (Prepping my 1611 scrapes [folders & files]) files list; for parsing the HTML to Python and then Payload to MariaDB (Part 2) Source/Tutorials: https://unix.stackexchange.com/questions/47858/how-can-i-search-a-wild-card-name-in-all-subfolders IRC Network - Libera.Chat - #linux:
My working directory for WGET with target folder (with over 160k files/folders - contents) "www.kingjamesbibleonline.org" I executed the following command to create "1611_3.txt" which now has all the files and folders containing "1611". /WGET-11.02.2021.www.kingjamesbibleonline.org$ find www.kingjamesbibleonline.org/ -name '*1611*' > 1611_3.txtNow here is a sample output of the "1611_3.txt" file generated: www.kingjamesbibleonline.org/Luke-Chapter-24_Original-1611-KJV www.kingjamesbibleonline.org/Iohn_13_1611 www.kingjamesbibleonline.org/The-Epistle-to-the-Romanes_12_1611 www.kingjamesbibleonline.org/Reuelation_21_1611 www.kingjamesbibleonline.org/Prouerbs_22_1611 www.kingjamesbibleonline.org/Ecclesiastes_3_1611 www.kingjamesbibleonline.org/1-Corinthians_13_1611 www.kingjamesbibleonline.org/Psalmes_16_1611 www.kingjamesbibleonline.org/Ephesians_5_1611 www.kingjamesbibleonline.org/Discussion-Thread-101611 www.kingjamesbibleonline.org/John-Chapter-15_Original-1611-KJV www.kingjamesbibleonline.org/Discussion-Thread-161149 www.kingjamesbibleonline.org/Revelation-Chapter-22_Original-1611-KJV www.kingjamesbibleonline.org/2-Samuel-Chapter-22_Original-1611-KJV www.kingjamesbibleonline.org/1-Chronicles-Chapter-16_Original-1611-KJV www.kingjamesbibleonline.org/Job-Chapter-19_Original-1611-KJV www.kingjamesbibleonline.org/Job-Chapter-22_Original-1611-KJV www.kingjamesbibleonline.org/Psalms-Chapter-23_Original-1611-KJV www.kingjamesbibleonline.org/Romans-Chapter-12_Original-1611-KJV www.kingjamesbibleonline.org/Revelation-Chapter-21_Original-1611-KJV www.kingjamesbibleonline.org/Galatians-Chapter-6_Original-1611-KJV www.kingjamesbibleonline.org/Isaiah-Chapter-26_Original-1611-KJV www.kingjamesbibleonline.org/Colossians-Chapter-3_Original-1611-KJV www.kingjamesbibleonline.org/Lamentations-Chapter-3_Original-1611-KJV www.kingjamesbibleonline.org/2-Corinthians-Chapter-3_Original-1611-KJVEach folder has 1 "index.html" inside for parsing. What is the best way to start a python script to loop through each directory listed in "1611_3.txt" and run bs4 against it? Thank you everyone for this forum! Best Regards, Brandon Kastning RE: Securing King James Bible (KJV) + King James Bible 1611 (KJV1611) in MariaDB/MySQL - BrandonKastning - Jan-18-2022 Part 1.B: Accomplished (Prepping my 1611 scrapes [folders & files]) files list; for parsing the HTML to Python and then Payload to MariaDB (Part 2) Source/Tutorials: https://askubuntu.com/questions/537967/appending-to-end-of-a-line-using-sed IRC Network - Libera.Chat - #linux: loganlee: sed append syntax - all lines w/ "/index.html" using the above domain's example with a sed error (with me trying to make it work). Solution: WGET-11.02.2021.www.kingjamesbibleonline.org/sed_1611_3$ sed 's/\(.*\)/\1\/index.html/g' 1611_3.txt >1611_4_sed4.txtSample of "1611_4_sed4.txt": www.kingjamesbibleonline.org/Luke-Chapter-24_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Iohn_13_1611/index.html www.kingjamesbibleonline.org/The-Epistle-to-the-Romanes_12_1611/index.html www.kingjamesbibleonline.org/Reuelation_21_1611/index.html www.kingjamesbibleonline.org/Prouerbs_22_1611/index.html www.kingjamesbibleonline.org/Ecclesiastes_3_1611/index.html www.kingjamesbibleonline.org/1-Corinthians_13_1611/index.html www.kingjamesbibleonline.org/Psalmes_16_1611/index.html www.kingjamesbibleonline.org/Ephesians_5_1611/index.html www.kingjamesbibleonline.org/Discussion-Thread-101611/index.html www.kingjamesbibleonline.org/John-Chapter-15_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Discussion-Thread-161149/index.html www.kingjamesbibleonline.org/Revelation-Chapter-22_Original-1611-KJV/index.html www.kingjamesbibleonline.org/2-Samuel-Chapter-22_Original-1611-KJV/index.html www.kingjamesbibleonline.org/1-Chronicles-Chapter-16_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Job-Chapter-19_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Job-Chapter-22_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Psalms-Chapter-23_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Romans-Chapter-12_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Revelation-Chapter-21_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Galatians-Chapter-6_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Isaiah-Chapter-26_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Colossians-Chapter-3_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Lamentations-Chapter-3_Original-1611-KJV/index.html www.kingjamesbibleonline.org/2-Corinthians-Chapter-3_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Luke-Chapter-12_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Psalms-Chapter-5_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Romans-Chapter-3_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Psalms-Chapter-84_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Isaiah-Chapter-64_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Romans-Chapter-6_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Hebrews-Chapter-4_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Matthew-Chapter-16_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Psalms-Chapter-9_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Matthew-Chapter-19_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Acts-Chapter-20_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Mark-Chapter-9_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Matthew-Chapter-22_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Hebrews-Chapter-13_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Romans-Chapter-10_Original-1611-KJV/index.html www.kingjamesbibleonline.org/James-Chapter-4_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Psalms-Chapter-55_Original-1611-KJV/index.html www.kingjamesbibleonline.org/1-Corinthians-Chapter-15_Original-1611-KJV/index.html www.kingjamesbibleonline.org/Psalms-Chapter-103_Original-1611-KJV/index.htmlNow I need to learn how to take the original list of directories (not with the appended /index.html) -- using this list for the python script, beautifulsoup4 and loops for *parsing*. Does anyone know how to cp -R using a list like 1611_3.txt ? This way I can extract a copy of only the 1611 folders (the website is rather huge and the 1611 portion is much smaller than the whole WGET download copy). Once I get it extracted; I should be able to start writing the python script where I will be needing help, I am sure! Thank you everyone for this forum! Best Regards, Brandon Kastning RE: Securing King James Bible (KJV) + King James Bible 1611 (KJV1611) in MariaDB/MySQL - Larz60+ - Jan-18-2022 FYI: when you make multiple posts prior to receiving an answer, your thread will appear in 'unread posts' with replies column > 0. When checking for unanswered threads, it appears as if your thread has been, or is being, serviced. Usually not an issue on slow days, but your thread can be overlooked on busy days. If you have no responses to an original post, you can edit and avoid this. RE: Securing King James Bible (KJV) + King James Bible 1611 (KJV1611) in MariaDB/MySQL - BrandonKastning - Jan-18-2022 Larz60, As opposed to making replies to my own OP thread? I know that some forums have x amount of time you have to edit your original post. I am glad to know this and will edit my post rather than reply. Thank you! And thank you for this forum! Best Regards, Brandon Kastning (Jan-18-2022, 02:54 PM)Larz60+ Wrote: FYI: when you make multiple posts prior to receiving an answer, your thread will appear in 'unread posts' with replies column > 0. When checking for unanswered threads, it appears as if your thread has been, or is being, serviced. Usually not an issue on slow days, but your thread can be overlooked on busy days. If you have no responses to an original post, you can edit and avoid this. |