Posts: 20
Threads: 11
Joined: Jan 2019
I have the need to change a list of ips into a regular expression, then copy / paste the results else where
The starting list
Quote:10.10.10.10 host1
10.10.10.11 host2
10.10.10.12 host3
10.10.10.13 host4
10.10.10.14 host5
The desired output
Quote:^10.10.10.10$|^10.10.10.11$|^10.10.10.12$|^10.10.10.13$|^10.10.10.14$
The current output .. notice the last "|", I want that removed.
Quote:^10.10.10.10$|^10.10.10.11$|^10.10.10.12$|^10.10.10.13$|^10.10.10.14$|
My cheep g code
!/usr/bin/python
# -*- coding: utf-8 -*-
from __future__ import print_function
import sys, os, re
def cls():
os.system('clear')
def main():
cls()
try:
#olist = []
for line in open (sys.argv[1], 'r' ):
word_list = line.split()
word_list[0] = re.sub("^", "^", word_list[0], flags=re.M)
word_list[0] = re.sub("$", "$|", word_list[0], flags=re.M)
print(word_list[0],end='')
print('\n\n')
except IOError as e :
print("File Open Error")
print("Error :", str(e))
except IndexError as i :
print("Usage: argv[0] <file having ip as the first field, hostname as the second>\nExample : 10.10.10.10 host1\n 10.10.10.10 host2\n 10.10.10.12 host3")
main() Working on a Linux vm
[localhost etc]$ cat system-release
CentOS Linux release 7.6.1810 (Core)
[localhost etc]$ python -V
Python 2.7.5
I know .. the Python version is old and crusty considering 3.8 is in beta ... but they are still using 2.7 at work.
My Question
The only way I can think of to get rid of the trailing pipe is to count the lines in the file, iterate a separate counter as I run through the file, compare the constant to the line counter, if equal do some thing like print word_list[0][:-1]
Is there a better way to do this .. as a side question .. is there a way to combine the 2 re's into a single line ?
Thanks for any help provided !!!
Regards
Sum
Posts: 2,127
Threads: 11
Joined: May 2017
Sep-06-2019, 03:17 PM
(This post was last modified: Sep-06-2019, 03:17 PM by DeaD_EyE.)
Use str.join
My output:
Output: deadeye@nexus ~ $ python2.7 parse_ips.py
Without piping to program, you have to use --input-file
deadeye@nexus ~ $ python2.7 parse_ips.py --input-file
usage: parse_ips.py [-h] [--input-file INPUT_FILE]
parse_ips.py: error: argument --input-file: expected one argument
deadeye@nexus ~ $ python2.7 parse_ips.py --input-file hosts.txt
^10\.10\.10\.10$|^10\.10\.10\.11$|^10\.10\.10\.12$|^10\.10\.10\.13$|^10\.10\.10\.14$
deadeye@nexus ~ $ cat hosts.txt | python2.7 parse_ips.py
^10\.10\.10\.10$|^10\.10\.10\.11$|^10\.10\.10\.12$|^10\.10\.10\.13$|^10\.10\.10\.14$
Code:
#!/usr/bin/env python2.7
from __future__ import print_function
import sys
import argparse
def ip2regex(text):
ips = []
for row in text.splitlines():
try:
ip, hostname = row.split()
except ValueError:
# skip errors
continue
ip = '^' + ip.replace('.', r'\.') + '$'
ips.append(ip)
return '|'.join(ips)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--input-file', required=False, help='Input file to generate regex output.')
args = parser.parse_args()
if args.input_file is None and not sys.stdin.isatty():
print(ip2regex(sys.stdin.read()))
elif args.input_file and sys.stdin.isatty():
with open(args.input_file) as fd:
print(ip2regex(fd.read()))
else:
print('Without piping to program, you have to use --input-file', file=sys.stderr) Line 15-17 preparing the IP address. By the way, a dot is a metachar in regex. The dot stands for any kind of char.
If you use the dot without escaping it, the regex ^10.10.10.10$ will be also match: 10510710310
PS: split is the opposite of join .
Posts: 8,163
Threads: 160
Joined: Sep 2016
why complicate things that much? simple string methods and formating would do?
infile = sys.argv[1]
with (infile, 'r') as f:
ips = [ip_addr for line in f for ip_addr, *_ in line.split()]
print('|'.join('^{}$'.format(ip_addr) for ip_addr in ips)) and these 4 lines can be shorten to 2
Posts: 20
Threads: 11
Joined: Jan 2019
Sep-06-2019, 04:59 PM
(This post was last modified: Sep-06-2019, 05:04 PM by sumncguy.)
(Sep-06-2019, 03:17 PM)DeaD_EyE Wrote: Use str.join
My output:
Output: deadeye@nexus ~ $ python2.7 parse_ips.py
Without piping to program, you have to use --input-file
deadeye@nexus ~ $ python2.7 parse_ips.py --input-file
usage: parse_ips.py [-h] [--input-file INPUT_FILE]
parse_ips.py: error: argument --input-file: expected one argument
deadeye@nexus ~ $ python2.7 parse_ips.py --input-file hosts.txt
^10\.10\.10\.10$|^10\.10\.10\.11$|^10\.10\.10\.12$|^10\.10\.10\.13$|^10\.10\.10\.14$
deadeye@nexus ~ $ cat hosts.txt | python2.7 parse_ips.py
^10\.10\.10\.10$|^10\.10\.10\.11$|^10\.10\.10\.12$|^10\.10\.10\.13$|^10\.10\.10\.14$
Yep thanks .. I understand that he "." means any char. The App Im pasting into recognizes an ip just so long its wrapped in ^$.
Thanks.
Code:
#!/usr/bin/env python2.7
from __future__ import print_function
import sys
import argparse
def ip2regex(text):
ips = []
for row in text.splitlines():
try:
ip, hostname = row.split()
except ValueError:
# skip errors
continue
ip = '^' + ip.replace('.', r'\.') + '$'
ips.append(ip)
return '|'.join(ips)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--input-file', required=False, help='Input file to generate regex output.')
args = parser.parse_args()
if args.input_file is None and not sys.stdin.isatty():
print(ip2regex(sys.stdin.read()))
elif args.input_file and sys.stdin.isatty():
with open(args.input_file) as fd:
print(ip2regex(fd.read()))
else:
print('Without piping to program, you have to use --input-file', file=sys.stderr) Line 15-17 preparing the IP address. By the way, a dot is a metachar in regex. The dot stands for any kind of char.
If you use the dot without escaping it, the regex ^10.10.10.10$ will be also match: 10510710310
PS: split is the opposite of join .
Ive seen this construct in some example code .. but not in any instruction .... probably because Im just starting out.
What is it called and where can I learn about it ..
ips = [ip_addr for line in f for ip_addr, *_ in line.split()]
Posts: 8,163
Threads: 160
Joined: Sep 2016
(Sep-06-2019, 04:59 PM)sumncguy Wrote: What is it called and where can I learn about it .. this is list comprehension. but yu can also have generator expression, e.g. (ip_addr for line in f for ip_addr, *_ in line.split()) in which case it will not create full list in memory or dict comprehension
note that it can be expanded as normal for loop
infile = sys.argv[1]
ips = []
with (infile, 'r') as f:
for line in f:
for ip_addr, *_ in line.split():
ips.append(ip_addr)
print('|'.join('^{}$'.format(ip_addr) for ip_addr in ips))
Posts: 20
Threads: 11
Joined: Jan 2019
list comprehension .. thank you
I dont like to just copy and paste solutions given that I don't understand. Main reason, if I did, next month when I look at the code again ... I'd be thinking 'What the heck does that do again ?" So if I get even a high level understanding an annotate my script .. it will be easier to jar this crusty 54 year old memory ! :)
Thanks
Sum
Posts: 2,127
Threads: 11
Joined: May 2017
Sep-06-2019, 11:59 PM
(This post was last modified: Sep-06-2019, 11:59 PM by DeaD_EyE.)
The _ is a valid name in Python.
In interactive mode it holds the last result, if it was not assigned to a name.
>>> 5+5
10
>>> print(_)
10 And the effect of the wildcard in front of one of the names in a assignment:
start, *middle, end = 'start 1 2 3 4 5 end'.split()
print(start)
print(middle)
print(end) Output: start
['1', '2', '3', '4', '5']
end
ips = [ip_addr for line in f for ip_addr, *_ in line.split()] Is the same like:
ips = []
for line in f:
for ip_addr, *_ in line.split():
ips.append(ip_addr) The name f should point to an open file. Iterating over a file-object, gets line by line.
But I think this example is overcomplicated. You can write this as:
ips = []
for line in f:
ip_addr, *rest = line.split()
ips.append(ip_addr) Then you get rid of the nested loop.
Turning this into a list comprehension:
ips = [line.split()[0] for line in f]
Posts: 20
Threads: 11
Joined: Jan 2019
Sep-16-2019, 03:40 PM
(This post was last modified: Sep-16-2019, 03:40 PM by sumncguy.)
I found that a few VMs are using 2.6.6..
Seems that format wasnt introduced until 2.7 .. so the print solution doesnt work in some cases.
Can anyone point me to a place where I can find out how to truncate the last "|" in 2.6.6.
I wish
1. they would standardize the Linux and python version they are using.
2. upgrade at least to python 3.x .. especially being that 3.8 is in beta 2 !!
I work for a big company .. cant say which .. but I find it incredible that they arent really doing any admin on their VMs.
Thanks for the help
Sum
Posts: 8,163
Threads: 160
Joined: Sep 2016
Sep-16-2019, 03:50 PM
(This post was last modified: Sep-16-2019, 03:50 PM by buran.)
it works, in 2.6 just need to number the placehodler(s) (i.e. explicitly specify the order in which to place values in palceholders).
print('|'.join('^{0}$'.format(ip_addr) for ip_addr in ips)) this will work also in 2.7 and 3.x versions
Or to say it the other way around - in 2.7 and 3.x you can skip the number
|