Python Forum

Hello,
I have some code like this

import re, urllib

GRUBER_URLINTEXT_PAT = re.compile("(https?://)([^:^/]*)(:\\d*)?(.*)?")

for line in urllib.urlopen("https://pastebin.com/raw/hvGXKp72").readlines():
    print [ str(mgroups[1]).replace('\r\n','') for mgroups in GRUBER_URLINTEXT_PAT.findall(line) ]

this code to read

example.com

only without HTTP,HTTPS & WWW
Now i have a question , how to valid the Subdomain ? e.g

subdomain.example.com

is readable with the code?
and how to input manual the link website from

("https://pastebin.com/raw/hvGXKp72")

? e.g

Please Input Your Website :

then input the website manually.

Thank you in advace,
I hope anybody can help me.

Sorry for my bad english Angel

Well, re.compile("(https?://)?([^:^/]*)(:\\d*)?(.*)?") will only catch the http(s) if it is there, and will match example.com. It will also match subdomain.example.com, but all of that will be in the second group. Is that what you wanted or did you want the subdomain to be in a separate group?

As for asking for user input, that's easy:

url = input('Please input your website: ')
match = GRUBER_URLINTEXT_PAT.match(url)
if match is None:
    print('Invalid url.')
else:
    print('That url is valid.')

rtzki

ichabod801