Aug-27-2022, 10:27 PM
(Aug-27-2022, 09:53 PM)snippsat Wrote:(Aug-27-2022, 08:54 PM)kucingkembar Wrote: it will nice to get that regex that can be useRegex and Html are not best friends the classics post🌞 that never get old.
I would do it like this.
import requests from bs4 import BeautifulSoup # pip install html2text import html2text url = 'https://lightnovelstranslations.com/the-galactic-navy-officer-becomes-an-adventurer/chapter-95-preparations-for-departure-part-4/' response = requests.get(url) soup = BeautifulSoup(response.content, 'lxml') story = soup.select_one('#post-104395 > div') text_maker = html2text.HTML2Text() text_maker.ignore_links = True text = text_maker.handle(story.prettify()) print(text)
Output:Chapter 95 - Preparations for Departure Part 3 * * * **Translator: SFBaka** **Editor: Thor’s Stone** * * * –Roberto’s POV– The princess and Alan-sama welcomed our arrival at the royal capital with more enthusiasm than I expected. I’m glad I prepared myself beforehand to get scolded for arbitrarily departing with an advanced party. It was already late at night, and most of the others have returned to their rooms. But some of the leaders including Adjutant Dalshim still remained in the hall to talk more with me. “So how is it? What is your impression of serving under Alan-sama, Dalshim- dono?” “In a word, splendid. I can declare without any qualms that everything we’ve accomplished so far was largely due to Alan-sama’s contributions.” .....
Regex and Html are not best friends the classics post🌞 that never get old.
i read that link before, the solution is about,
sorry if external question, what is this "Have you tried using an XML parser instead?"
any link to it?
anyway your code work, I add reputation point again for you and another one who replies