(Nov-22-2021, 07:46 PM)BrandonKastning Wrote: [ -> ]I want to use my previous example from Kung-Fu_A (bringing the entire HTML remote document into local Python Memory and stored into Variable "all_of_it"). Then bring it down into lines and process each line as an individual insert!
The paragraph with text on web-site is not well format,so one way is to say that a paragraph is the lines then go with that.
Other way have to try split of the text in paragraph to lines.
Quick look.
import requests
from bs4 import BeautifulSoup
url = "https://law.justia.com/constitution/us/preamble.html"
html = requests.get(url)
soup = BeautifulSoup(html.content, 'lxml')
all_p = soup.select('p')
>>> print(all_p[0].text)
We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.
>>>
>>> print(all_p[2].text)
Although the preamble is not a source of power for any department of the Federal Government,1 the Supreme Court has often referred to it as evidence of the origin, scope, and purpose of the Constitution.2 “Its true office,” wrote Joseph Story in his Commentaries, “is to expound the nature and extent and application of the powers actually conferred by the Constitution, and not substantively to create them. For example, the preamble declares one object to be, ‘provide for the common defense.’ No one can doubt that this does not enlarge the powers of Congress to pass any measures which they deem useful for the common defence. But suppose the terms of a given power admit of two constructions, the one more restrictive, the other more liberal, and each of them is consistent with the words, but is, and ought to be, governed by the intent of the power; if one could promote and the other defeat the common defence, ought not the former, upon the soundest principles of interpretation, to be adopted?”3
So the first paragraph there is really no good way to spilt it up,other maybe split in 3 based on length.
The second one could split at
.
.
>>> par_2 = all_p[2].text.split('.')
>>> for index, line in enumerate(par_2, 1):
... print(f"{line} <line{index}>\n")
Output:
Although the preamble is not a source of power for any department of the Federal Government,1 the Supreme Court has often referred to it as evidence of the origin, scope, and purpose of the Constitution <line1>
2 “Its true office,” wrote Joseph Story in his Commentaries, “is to expound the nature and extent and application of the powers actually conferred by the Constitution, and not substantively to create them <line2>
For example, the preamble declares one object to be, ‘provide for the common defense <line3>
’ No one can doubt that this does not enlarge the powers of Congress to pass any measures which they deem useful for the common defence <line4>
But suppose the terms of a given power admit of two constructions, the one more restrictive, the other more liberal, and each of them is consistent with the words, but is, and ought to be, governed by the intent of the power; if one could promote and the other defeat the common defence, ought not the former, upon the soundest principles of interpretation, to be adopted?”3 <line5>