Python Forum
How can I get the Middle English and Modern English from this page?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How can I get the Middle English and Modern English from this page?
#4
(Feb-03-2022, 09:21 AM)Pedroski55 Wrote: In between, if you look at the webpage source code, there are endless lines of html p tag pairs.
Indeed, I don't know what it is for, but it might be to fill up the left part of the page, under the menu, to have just as many lines as the right part, with the text.
(Feb-03-2022, 09:21 AM)Pedroski55 Wrote: but it is protected somehow
No, there is no protection. I can just copy and paste the text.
I tried this:
import requests
from bs4 import BeautifulSoup

page = requests.get("https://chaucer.fas.harvard.edu/pages/knights-tale-0")
soup = BeautifulSoup(page.content, 'html.parser')
for s_line in soup.find_all("span", attrs={"style": "font-family:'book antiqua', palatino"}, limit=20):
    print(s_line.text)
Output:
Iamque domos patrias, Sithice post aspera gentis prelia,laurigero, etc. [And now (Theseus drawing nigh his) native land in laurelled car after battling with the Scithian folk, etc.] 859        Whilom, as olde stories tellen us,                Once, as old histories tell us, 860        Ther was a duc that highte Theseus;                There was a duke who was called Theseus; 861        Of Atthenes he was lord and governour,                He was lord and governor of Athens, 862        And in his tyme swich a conquerour                And in his time such a conqueror 863        That gretter was ther noon under the sonne.                That there was no one greater under the sun. 864        Ful many a riche contree hadde he wonne;                Very many a powerful country had he won; 865        What with his wysdom and his chivalrie,                What with his wisdom and his chivalry, 866        He conquered al the regne of Femenye,                He conquered all the land of the Amazons, 867        That whilom was ycleped Scithia, Process finished with exit code 0
Is this a good start for you to continue?
Pedroski55 likes this post
Reply


Messages In This Thread
RE: How can I get the Middle English and Modern English from this page? - by ibreeden - Feb-03-2022, 11:10 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  use Xpath in Python :: libxml2 for a page-to-page skip-setting apollo 2 3,717 Mar-19-2020, 06:13 PM
Last Post: apollo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020