Python Forum

Full Version: using vars from one file to match lines in another
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello all,

I have a problem where I have two very long lists and I need to use variables (ip addresses) to match lines in another.
fileseen is the list of all IP addresses observed at a certain location, filemaybe is a list of hosts in subnets available in various locations.

I am hoping to use IPs in fileseen to get the SUBLOC location code in file two when a match between an actual and potential occured.

My code so far looks like this, it does the regex's fine, but it does not loop back the next line of the
first and ip match to extract the next set of variables:

cat file1:
(unique $1, not unique $2,$3)
10.0.1.x LO1 192.168.1.11
10.0.2.x LO2 192.168.1.11

cat file2:
(all potential ip's for a given LOC site, $1 unique)
10.0.1.x SUBLOC

#!/usr/bin/python3

import sys
import re

def extract_ips(data):
	regex=re.compile(r"[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}")
	return (regex.findall(data))

def extract_proxy(data):
	regex=re.compile(r"\ [A-Z0-9]{1,5}\ ")
	return (regex.findall(data))

def extract_proxyip(data):
	regex=re.compile(r"[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}")
	return (regex.findall(data))


fileseen = open ('file1' ,'r', encoding='utf-8')
filemaybe = open ('file2','r', encoding='utf-8')

seen_hosts=[]
for line in fileseen:
	for ip in extract_ips(line):
		if ip not in seen_hosts:
        seen_hosts.append(ip)
			for proxy in extract_proxy(line):
				for proxyip in extract_proxyip(line):
#				print ('Seen:', seenip, ' BEHIND Proxy:',boproxy, ' BO Proxy IP:',proxyip)
					for line in filemaybe:
						if re.search(ip, line):
					#		print ('Matched actual:', ip, 'Sublocation: ').  # help is needed here (Want SUBLOC, I only get 1 match


                    #print ('Seen:', seenip, ' BEHIND Proxy:',proxy, ' BO Proxy IP:',proxyip)
Thank you very much.

Greetings - G

sorry I cleaned this up a bit.
Works for the first file. Now how can I used this to get the stuff I want from file two?
I had some help some years ago for Perl via Perlmonks, they used hash indexes, but I am obviously not a coder.

#!/usr/bin/python3

import sys
import re

def extract_ips(data):
    regex=re.compile(r"^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}")
    return (regex.findall(data))

def extract_bo(data):
    regex=re.compile(r"\ [A-Z0-9]{1,5}\ ")
    return (regex.findall(data))

def extract_proxyip(data):
    regex=re.compile(r"[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$")
    return (regex.findall(data))



filemaybe = open ('all_pot_hosts.txt','r', encoding='utf-8')
fileseen = open ('seen_hosts_by_proxy.txt' ,'r', encoding='utf-8')


seen_hosts=[]
for line in fileseen:
    for ip in extract_ips(line):
        if ip not in seen_hosts:
            for boproxy in extract_bo(line):
                for proxyip in extract_proxyip(line):
                    print ('Seen:', ip, ' BEHIND Proxy:',boproxy, ' BO Proxy IP:',proxyip)
It's not totally clear what's going on here to me. That is, it's not clear which file is file1 and which file is file2. However, I think you need to read the second file in as a dictionary, with the key being the common element with the first file, and the value being whatever information you want to get out of the second file. Do that before looping through the first file, then use it to get the information you need as you loop through the first file.
Hello again,

I am trying to have a "large" dict where the key is a converted ipv4 to decimal address which is later to be checked against a match from another dict (from 2nd file).

I cannot seem to add to the dict, I only get a single updated "row" from the last match.
Instead I want all of my keys and values from the file to show up and "stay in memory" until the check against the ipv4-decimal key the second dict can be made.

Any help would be greatly appreciated. Thank you - GT

#!/usr/bin/python3

import sys
import re
from ipaddress import ip_address

def extract_ips(data):
	regex=re.compile(r"^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}")
	return (regex.findall(data))

def extract_bo(data):
	regex=re.compile(r"[A-Z]{1,4}[0-9]{1}")
	return (regex.findall(data))

def extract_proxy(data):
	regex=re.compile(r"[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$")
	return (regex.findall(data))

def extract_dict_vars(data):
	global ip
	global dec_ip
	global bo
	global dec_proxy
	for ip in extract_ips(data):
		dec_ip = int(ip_address(ip))
		for bo in extract_bo(data):
			for proxy in extract_proxy(line):
				dec_proxy = int(ip_address(proxy))
				return (dec_ip,ip,bo,dec_proxy)

sendict = {}
with open('seen_hosts_by_proxy.txt', 'r', encoding='utf-8') as fileseen:
	for line in fileseen:
		extract_dict_vars(line)
		seendict = {'ip_decimal': dec_ip, 'ipv4': ip, 'breakout': bo, 'proxy:': dec_proxy}
print (seendict)
Avoid global. Here you don't really need it, because you are returning all of the values you have globaled. You just need to assign those values when you call the function on line 34. See the function tutorial on how to do that.

You need to unindent the return statement in extract_dict_vars. The first time through the loops it returns the value, which stops the loops from processing. Unindent it so it is even with the first for loop, taking it out of all the for loops.

On line 35 you are replacing the entire dictionary every time through the loop. You need to add to the dictionary based on a key (seendict[dec_ip] = {'ipv4': ip, ...}).