Feb-03-2022, 11:10 AM
(Feb-03-2022, 09:21 AM)Pedroski55 Wrote: In between, if you look at the webpage source code, there are endless lines of html p tag pairs.Indeed, I don't know what it is for, but it might be to fill up the left part of the page, under the menu, to have just as many lines as the right part, with the text.
(Feb-03-2022, 09:21 AM)Pedroski55 Wrote: but it is protected somehowNo, there is no protection. I can just copy and paste the text.
I tried this:
import requests from bs4 import BeautifulSoup page = requests.get("https://chaucer.fas.harvard.edu/pages/knights-tale-0") soup = BeautifulSoup(page.content, 'html.parser') for s_line in soup.find_all("span", attrs={"style": "font-family:'book antiqua', palatino"}, limit=20): print(s_line.text)
Output:Iamque domos patrias, Sithice post aspera gentis prelia,laurigero, etc.
[And now (Theseus drawing nigh his) native land in
laurelled car after battling with the Scithian folk, etc.]
859 Whilom, as olde stories tellen us,
Once, as old histories tell us,
860 Ther was a duc that highte Theseus;
There was a duke who was called Theseus;
861 Of Atthenes he was lord and governour,
He was lord and governor of Athens,
862 And in his tyme swich a conquerour
And in his time such a conqueror
863 That gretter was ther noon under the sonne.
That there was no one greater under the sun.
864 Ful many a riche contree hadde he wonne;
Very many a powerful country had he won;
865 What with his wysdom and his chivalrie,
What with his wisdom and his chivalry,
866 He conquered al the regne of Femenye,
He conquered all the land of the Amazons,
867 That whilom was ycleped Scithia,
Process finished with exit code 0
Is this a good start for you to continue?