Nov-09-2019, 06:47 AM
(This post was last modified: Nov-09-2019, 06:47 AM by jarmerfohn.)
All I'm trying to do is test print an html string given a regex pattern but the result is always incomplete and I cant figure out why. I'm new to python, and a coding amateur in general... bla bla... But all the regex training sites lead me to believe my pattern will work for this seemingly simple html capture but it keeps getting cut off in the build. I've been trying different flags but I dont think that's the issue. I also know its not the re.py cache. It's gotta be an escape char that I cant figure out, right?
GOAL:
Trying to print: "https://newjersey.craigslist.orgparlin-chevrolet-colorado-call/7014860327.html"
compile result is: "https://newjersey.craigslist.orgparlin-chevrolet-"
Jr
GOAL:
Trying to print: "https://newjersey.craigslist.orgparlin-chevrolet-colorado-call/7014860327.html"
compile result is: "https://newjersey.craigslist.orgparlin-chevrolet-"
from urllib.request import urlopen from urllib.error import HTTPError from urllib.error import URLError from bs4 import BeautifulSoup import re str1 = (""" bhcgHf4AWry,1:00N0N_iBTHgJR0p0p_2hkovkPhFZk,1:00101_bX2XWbjP0wA,1:00j0j_5naXGGGbBUK,1:00j0j_gbiQHGBLUjL,1:00k0k_fnTDHBeHrt5,1:00s0s_375GQT7ladO" href="https://newjersey.craigslist.orgparlin-chevrolet-colorado-call/7014860327.html"> <span class="result-price">$18000</span> </a> """) print(str1) reSearch1 = re.search(r'(https:).*(.html)', str1, flags=re.UNICODE) print(reSearch1)
Output:
bhcgHf4AWry,1:00N0N_iBTHgJR0p0p_2hkovkPhFZk,1:00101_bX2XWbjP0wA,1:00j0j_5naXGGGbBUK,1:00j0j_gbiQHGBLUjL,1:00k0k_fnTDHBeHrt5,1:00s0s_375GQT7ladO" href="https://newjersey.craigslist.orgparlin-chevrolet-colorado-call/7014860327.html">
<span class="result-price">$18000</span>
</a>
<re.Match object; span=(153, 231), match='https://newjersey.craigslist.orgparlin-chevrolet->
[Finished in 0.2s]
Thanks for any help gents,Jr