Python Forum

Full Version: Regex Subdomain Validation & Input website manual
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,
I have some code like this

import re, urllib

GRUBER_URLINTEXT_PAT = re.compile("(https?://)([^:^/]*)(:\\d*)?(.*)?")

for line in urllib.urlopen("https://pastebin.com/raw/hvGXKp72").readlines():
    print [ str(mgroups[1]).replace('\r\n','') for mgroups in GRUBER_URLINTEXT_PAT.findall(line) ]
this code to read
example.com
only without HTTP,HTTPS & WWW
Now i have a question , how to valid the Subdomain ? e.g
subdomain.example.com
is readable with the code?
and how to input manual the link website from
("https://pastebin.com/raw/hvGXKp72")
? e.g
Please Input Your Website :
then input the website manually.

Thank you in advace,
I hope anybody can help me.

Sorry for my bad english Angel Tongue
Well, re.compile("(https?://)?([^:^/]*)(:\\d*)?(.*)?") will only catch the http(s) if it is there, and will match example.com. It will also match subdomain.example.com, but all of that will be in the second group. Is that what you wanted or did you want the subdomain to be in a separate group?

As for asking for user input, that's easy:

url = input('Please input your website: ')
match = GRUBER_URLINTEXT_PAT.match(url)
if match is None:
    print('Invalid url.')
else:
    print('That url is valid.')