Posts: 20
Threads: 11
Joined: Jan 2019
I have the need to change a list of ips into a regular expression, then copy / paste the results else where
The starting list
Quote:10.10.10.10 host1
10.10.10.11 host2
10.10.10.12 host3
10.10.10.13 host4
10.10.10.14 host5
The desired output
Quote:^10.10.10.10$|^10.10.10.11$|^10.10.10.12$|^10.10.10.13$|^10.10.10.14$
The current output .. notice the last "|", I want that removed.
Quote:^10.10.10.10$|^10.10.10.11$|^10.10.10.12$|^10.10.10.13$|^10.10.10.14$|
My cheep g code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
! / usr / bin / python
from __future__ import print_function
import sys, os, re
def cls ():
os.system( 'clear' )
def main():
cls ()
try :
for line in open (sys.argv[ 1 ], 'r' ):
word_list = line.split()
word_list[ 0 ] = re.sub( "^" , "^" , word_list[ 0 ], flags = re.M)
word_list[ 0 ] = re.sub( "$" , "$|" , word_list[ 0 ], flags = re.M)
print (word_list[ 0 ],end = '')
print ( '\n\n' )
except IOError as e :
print ( "File Open Error" )
print ( "Error :" , str (e))
except IndexError as i :
print ( "Usage: argv[0] <file having ip as the first field, hostname as the second>\nExample : 10.10.10.10 host1\n 10.10.10.10 host2\n 10.10.10.12 host3" )
main()
|
Working on a Linux vm
[localhost etc]$ cat system-release
CentOS Linux release 7.6.1810 (Core)
[localhost etc]$ python -V
Python 2.7.5
I know .. the Python version is old and crusty considering 3.8 is in beta ... but they are still using 2.7 at work.
My Question
The only way I can think of to get rid of the trailing pipe is to count the lines in the file, iterate a separate counter as I run through the file, compare the constant to the line counter, if equal do some thing like print word_list[0][:-1]
Is there a better way to do this .. as a side question .. is there a way to combine the 2 re's into a single line ?
Thanks for any help provided !!!
Regards
Sum
Posts: 2,129
Threads: 11
Joined: May 2017
Sep-06-2019, 03:17 PM
(This post was last modified: Sep-06-2019, 03:17 PM by DeaD_EyE.)
Use str.join
My output:
Output: deadeye@nexus ~ $ python2.7 parse_ips.py
Without piping to program, you have to use --input-file
deadeye@nexus ~ $ python2.7 parse_ips.py --input-file
usage: parse_ips.py [-h] [--input-file INPUT_FILE]
parse_ips.py: error: argument --input-file: expected one argument
deadeye@nexus ~ $ python2.7 parse_ips.py --input-file hosts.txt
^10\.10\.10\.10$|^10\.10\.10\.11$|^10\.10\.10\.12$|^10\.10\.10\.13$|^10\.10\.10\.14$
deadeye@nexus ~ $ cat hosts.txt | python2.7 parse_ips.py
^10\.10\.10\.10$|^10\.10\.10\.11$|^10\.10\.10\.12$|^10\.10\.10\.13$|^10\.10\.10\.14$
Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
from __future__ import print_function
import sys
import argparse
def ip2regex(text):
ips = []
for row in text.splitlines():
try :
ip, hostname = row.split()
except ValueError:
continue
ip = '^' + ip.replace( '.' , r '\.' ) + '$'
ips.append(ip)
return '|' .join(ips)
if __name__ = = '__main__' :
parser = argparse.ArgumentParser()
parser.add_argument( '--input-file' , required = False , help = 'Input file to generate regex output.' )
args = parser.parse_args()
if args.input_file is None and not sys.stdin.isatty():
print (ip2regex(sys.stdin.read()))
elif args.input_file and sys.stdin.isatty():
with open (args.input_file) as fd:
print (ip2regex(fd.read()))
else :
print ( 'Without piping to program, you have to use --input-file' , file = sys.stderr)
|
Line 15-17 preparing the IP address. By the way, a dot is a metachar in regex. The dot stands for any kind of char.
If you use the dot without escaping it, the regex ^10.10.10.10$ will be also match: 10510710310
PS: split is the opposite of join .
Posts: 8,169
Threads: 160
Joined: Sep 2016
why complicate things that much? simple string methods and formating would do?
1 2 3 4 |
infile = sys.argv[ 1 ]
with (infile, 'r' ) as f:
ips = [ip_addr for line in f for ip_addr, * _ in line.split()]
print ( '|' .join( '^{}$' . format (ip_addr) for ip_addr in ips))
|
and these 4 lines can be shorten to 2
Posts: 20
Threads: 11
Joined: Jan 2019
Sep-06-2019, 04:59 PM
(This post was last modified: Sep-06-2019, 05:04 PM by sumncguy.)
(Sep-06-2019, 03:17 PM)DeaD_EyE Wrote: Use str.join
My output:
Output: deadeye@nexus ~ $ python2.7 parse_ips.py
Without piping to program, you have to use --input-file
deadeye@nexus ~ $ python2.7 parse_ips.py --input-file
usage: parse_ips.py [-h] [--input-file INPUT_FILE]
parse_ips.py: error: argument --input-file: expected one argument
deadeye@nexus ~ $ python2.7 parse_ips.py --input-file hosts.txt
^10\.10\.10\.10$|^10\.10\.10\.11$|^10\.10\.10\.12$|^10\.10\.10\.13$|^10\.10\.10\.14$
deadeye@nexus ~ $ cat hosts.txt | python2.7 parse_ips.py
^10\.10\.10\.10$|^10\.10\.10\.11$|^10\.10\.10\.12$|^10\.10\.10\.13$|^10\.10\.10\.14$
Yep thanks .. I understand that he "." means any char. The App Im pasting into recognizes an ip just so long its wrapped in ^$.
Thanks.
Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
from __future__ import print_function
import sys
import argparse
def ip2regex(text):
ips = []
for row in text.splitlines():
try :
ip, hostname = row.split()
except ValueError:
continue
ip = '^' + ip.replace( '.' , r '\.' ) + '$'
ips.append(ip)
return '|' .join(ips)
if __name__ = = '__main__' :
parser = argparse.ArgumentParser()
parser.add_argument( '--input-file' , required = False , help = 'Input file to generate regex output.' )
args = parser.parse_args()
if args.input_file is None and not sys.stdin.isatty():
print (ip2regex(sys.stdin.read()))
elif args.input_file and sys.stdin.isatty():
with open (args.input_file) as fd:
print (ip2regex(fd.read()))
else :
print ( 'Without piping to program, you have to use --input-file' , file = sys.stderr)
|
Line 15-17 preparing the IP address. By the way, a dot is a metachar in regex. The dot stands for any kind of char.
If you use the dot without escaping it, the regex ^10.10.10.10$ will be also match: 10510710310
PS: split is the opposite of join .
Ive seen this construct in some example code .. but not in any instruction .... probably because Im just starting out.
What is it called and where can I learn about it ..
1 |
ips = [ip_addr for line in f for ip_addr, * _ in line.split()]
|
Posts: 8,169
Threads: 160
Joined: Sep 2016
(Sep-06-2019, 04:59 PM)sumncguy Wrote: What is it called and where can I learn about it .. this is list comprehension. but yu can also have generator expression, e.g. (ip_addr for line in f for ip_addr, *_ in line.split()) in which case it will not create full list in memory or dict comprehension
note that it can be expanded as normal for loop
1 2 3 4 5 6 7 |
infile = sys.argv[ 1 ]
ips = []
with (infile, 'r' ) as f:
for line in f:
for ip_addr, * _ in line.split():
ips.append(ip_addr)
print ( '|' .join( '^{}$' . format (ip_addr) for ip_addr in ips))
|
Posts: 20
Threads: 11
Joined: Jan 2019
list comprehension .. thank you
I dont like to just copy and paste solutions given that I don't understand. Main reason, if I did, next month when I look at the code again ... I'd be thinking 'What the heck does that do again ?" So if I get even a high level understanding an annotate my script .. it will be easier to jar this crusty 54 year old memory ! :)
Thanks
Sum
Posts: 2,129
Threads: 11
Joined: May 2017
Sep-06-2019, 11:59 PM
(This post was last modified: Sep-06-2019, 11:59 PM by DeaD_EyE.)
The _ is a valid name in Python.
In interactive mode it holds the last result, if it was not assigned to a name.
And the effect of the wildcard in front of one of the names in a assignment:
1 2 3 4 |
start, * middle, end = 'start 1 2 3 4 5 end' .split()
print (start)
print (middle)
print (end)
|
Output: start
['1', '2', '3', '4', '5']
end
1 |
ips = [ip_addr for line in f for ip_addr, * _ in line.split()]
|
Is the same like:
1 2 3 4 |
ips = []
for line in f:
for ip_addr, * _ in line.split():
ips.append(ip_addr)
|
The name f should point to an open file. Iterating over a file-object, gets line by line.
But I think this example is overcomplicated. You can write this as:
1 2 3 4 |
ips = []
for line in f:
ip_addr, * rest = line.split()
ips.append(ip_addr)
|
Then you get rid of the nested loop.
Turning this into a list comprehension:
1 |
ips = [line.split()[ 0 ] for line in f]
|
Posts: 20
Threads: 11
Joined: Jan 2019
Sep-16-2019, 03:40 PM
(This post was last modified: Sep-16-2019, 03:40 PM by sumncguy.)
I found that a few VMs are using 2.6.6..
Seems that format wasnt introduced until 2.7 .. so the print solution doesnt work in some cases.
Can anyone point me to a place where I can find out how to truncate the last "|" in 2.6.6.
I wish
1. they would standardize the Linux and python version they are using.
2. upgrade at least to python 3.x .. especially being that 3.8 is in beta 2 !!
I work for a big company .. cant say which .. but I find it incredible that they arent really doing any admin on their VMs.
Thanks for the help
Sum
Posts: 8,169
Threads: 160
Joined: Sep 2016
Sep-16-2019, 03:50 PM
(This post was last modified: Sep-16-2019, 03:50 PM by buran.)
it works, in 2.6 just need to number the placehodler(s) (i.e. explicitly specify the order in which to place values in palceholders).
1 |
print ( '|' .join( '^{0}$' . format (ip_addr) for ip_addr in ips))
|
this will work also in 2.7 and 3.x versions
Or to say it the other way around - in 2.7 and 3.x you can skip the number
|