Posts: 170
Threads: 43
Joined: May 2019
ok, so i have a script that pings all our registers in the company to check if they are online or offline.. this works great. that script works off a list of just ip's. Once the script runs, it generates a output file in csv with the ip, status, datetime. Currently this is being pulled into excel to create a dashboard. NOW that VP's and management is aware of this dashboard, they want access, but providing access to so many people and one excel file, the file is going to get manipulated and eventually break. So im working on a new script that i can do everything im already doing in excel, but within the script to then generate a static report that can be sent out to everyone and no worry about pivot tables being changed or excel file being updated or changed and causing issues.
So i have a dataframe that returns a list of the ip's with store # and register number (source data) in the new script, this will be used as the source for IP's to ping. So in excel im using vlookup, how can i accomplish the same thing and add the store # and Register # to my csv of results?
Here is my source data that will be used to pull the ip, but also contains the store and register that i need to include in my results.csv
# SOURCE DATAFRAME
Output: IP Store Register
0 121.11.7.203 7 203
1 121.11.7.204 7 204
2 121.11.7.205 7 205
3 121.11.7.206 7 206
4 121.11.7.207 7 207
5 121.11.7.208 7 208
6 121.11.7.221 7 221
7 121.11.7.222 7 222
8 121.11.9.201 9 201
9 121.11.9.202 9 202
10 121.11.9.203 9 203
11 121.11.9.204 9 204
# RESULTS DATAFRAME THAT IS WRITTEN TO CSV
Output: IP Status Datetime
0 121.11.7.203 0 10/18/2023,16:12:01
1 121.11.7.204 0 10/18/2023,16:12:01
2 121.11.7.205 0 10/18/2023,16:12:01
3 121.11.7.206 0 10/18/2023,16:12:01
4 121.11.7.207 0 10/18/2023,16:12:01
5 121.11.7.208 0 10/18/2023,16:12:01
6 121.11.7.221 1 10/18/2023,16:12:01
7 121.11.7.222 0 10/18/2023,16:12:01
8 121.11.9.201 0 10/18/2023,16:12:01
9 121.11.9.202 0 10/18/2023,16:12:01
10 121.11.9.203 0 10/18/2023,16:12:01
11 121.11.9.204 0 10/18/2023,16:12:01
Since the results are captured into a df as they are gathered, then written to csv, i would like to add the store and register into the results before writing the csv. How can i search the source for the ip and return the 2 additional values i want and add to the results df?
Posts: 6,776
Threads: 20
Joined: Feb 2020
It is easy to have pandas do the merging, but it is always easier to do things correctly from the start. Does the ping script have access to this?
Output: IP Store Register
0 121.11.7.203 7 203
1 121.11.7.204 7 204
2 121.11.7.205 7 205
If so, why doesn't it put the store and register information in the output CSV? If it doesn't have access to store and register information, why not?
Here is how you can merge your source and results dataframes.
import pandas as pd
from random import randint
from datetime import datetime
source = pd.read_csv("test.csv")
print(source)
result = pd.DataFrame(
(
{"IP": ip, "Status": randint(0, 1), "Datetime": datetime.now()}
for ip in source["IP"].values[::-1]
)
)
print(result)
merged_results = pd.concat(
(source.set_index("IP"), result.set_index("IP")), axis=1
).reset_index()
print(merged_results) Output: IP Store Register
0 121.11.7.203 7 203
1 121.11.7.204 7 204
2 121.11.7.205 7 205
3 121.11.7.206 7 206
4 121.11.7.207 7 207
5 121.11.7.208 7 208
6 121.11.7.221 7 221
7 121.11.7.222 7 222
IP Status Datetime
0 121.11.7.222 0 2023-10-20 15:13:29.259966
1 121.11.7.221 0 2023-10-20 15:13:29.259966
2 121.11.7.208 1 2023-10-20 15:13:29.259966
3 121.11.7.207 0 2023-10-20 15:13:29.259966
4 121.11.7.206 1 2023-10-20 15:13:29.259966
5 121.11.7.205 0 2023-10-20 15:13:29.259966
6 121.11.7.204 0 2023-10-20 15:13:29.259966
7 121.11.7.203 0 2023-10-20 15:13:29.259966
IP Store Register Status Datetime
0 121.11.7.203 7 203 0 2023-10-20 15:13:29.259966
1 121.11.7.204 7 204 0 2023-10-20 15:13:29.259966
2 121.11.7.205 7 205 0 2023-10-20 15:13:29.259966
3 121.11.7.206 7 206 1 2023-10-20 15:13:29.259966
4 121.11.7.207 7 207 0 2023-10-20 15:13:29.259966
5 121.11.7.208 7 208 1 2023-10-20 15:13:29.259966
6 121.11.7.221 7 221 0 2023-10-20 15:13:29.259966
7 121.11.7.222 7 222 0 2023-10-20 15:13:29.259966
Posts: 170
Threads: 43
Joined: May 2019
 Great question, wasnt looking to make such a huge improvement / upgrade to the overall script, now that more and more people are seeing the dashboard, my manager/vp is asking for improvements, so im looking to revamp the whole thing..
So to answer your question, yes with the new script, i would like to use just 1 source file to use to ping, which like you said, contains the store and register number so the end result is just a csv with all the data.
So i would for sure like to provide the source file like this:
Output: IP Store Register
0 121.11.7.203 7 203
1 121.11.7.204 7 204
2 121.11.7.205 7 205
3 121.11.7.206 7 206
4 121.11.7.207 7 207
5 121.11.7.208 7 208
6 121.11.7.221 7 221
7 121.11.7.222 7 222
And get the results like your example:
Output: IP Store Register Status Datetime
0 121.11.7.203 7 203 0 2023-10-20 15:13:29.259966
1 121.11.7.204 7 204 0 2023-10-20 15:13:29.259966
2 121.11.7.205 7 205 0 2023-10-20 15:13:29.259966
3 121.11.7.206 7 206 1 2023-10-20 15:13:29.259966
4 121.11.7.207 7 207 0 2023-10-20 15:13:29.259966
5 121.11.7.208 7 208 1 2023-10-20 15:13:29.259966
6 121.11.7.221 7 221 0 2023-10-20 15:13:29.259966
7 121.11.7.222 7 222 0 2023-10-20 15:13:29.259966
So im going to play around with your example to see if i can make that happen. I BELIEVE the reason it was done the current way, was because it was easier to loop thru a text file of ip's (only 1 column) than it was to figure out how to from 3 columns..
Im still learning alot on python, so nothing like jumping right in and making it happen.
I appreciate the example and will test with my current files and script.
Posts: 170
Threads: 43
Joined: May 2019
So the example works great, i even used my actual lists and returned as expected (in my test file)
But in my actual working script, im having issues trying to replace this function so that it produces the same output as before in order to allow the ping function to run against the ip list.
# ROUTINE TO OPEN THE APPROPRIATE TEXT FILES TO PROCESS THE IP LIST
def scan_files():
global fn
global directory
global parts
for entry in os.scandir(directory):
if entry.is_file() and entry.name.endswith('.txt'):
if pType in entry.name:
pt = directory + '/' + entry.name
fn = pt
# THIS WILL PARSE OUT THE VALUE BEFORE THE FIRST '_' TO SET THE VARIABLE AND USE LATER FOR SAVING THE RESULTS.
parts = entry.name.split('_')
with open(pt) as file:
for ip in file:
yield ip.strip() the entire currently working script is this:
import os
import sys
import time
from datetime import datetime
import subprocess
from concurrent.futures import ThreadPoolExecutor
import pandas as pd
import mmap
import math
#pType = sys.argv[1]
pType = 'Register'
# ROUTINE TO PING LIST OF IP'S USING THREADING
def ping(ip):
return (
ip,
subprocess.run(
f"ping {ip} -n 1", stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
).returncode,
)
# ROUTINE TO OPEN THE APPROPRIATE TEXT FILES TO PROCESS THE IP LIST
def scan_files():
global fn
global directory
global parts
for entry in os.scandir(directory):
if entry.is_file() and entry.name.endswith('.txt'):
if pType in entry.name:
pt = directory + '/' + entry.name
fn = pt
# THIS WILL PARSE OUT THE VALUE BEFORE THE FIRST '_' TO SET THE VARIABLE AND USE LATER FOR SAVING THE RESULTS.
parts = entry.name.split('_')
with open(pt) as file:
for ip in file:
yield ip.strip()
# USED AS A GLOBAL VARIABLE TO GRAB THE FILE NAME AND USE IN FUNCTION TO RETURN ROW COUNT.
fn = ""
# DIRECTORY WHERE ALL IP LISTS RESIDE
directory = 'Lists'
# SETS TODAYS DATE TO CAPTURE AS THE "RUNDATE" AND SETS THE START TIME USED FOR CALCULATING THE RUN DURATION
now = datetime.now()
dt = now.strftime("%m/%d/%Y,%H:%M:%S")
start = time.time()
# THIS WILL EXECUTE THE PING FUNCTION IN THE NUMBER OF THREADS SPECIFIED
executor = ThreadPoolExecutor(125)
# WRITES THE COMPLETE LIST OF RESULTS TO THE DATAFRAME FOR SAVING
df = pd.DataFrame(executor.map(ping, list(scan_files())))
# ADDS A THIRD COLUMN WITH THE RUNDATE AND TIME TO THE DATAFRAME BEFORE SAVING
df[2] = dt
# SAVES THE RESULTS TO A CSV FILE FOR THE EXCEL FILE DASHBOARD
# IP, 0 = UP 1 = DOWN,RUN DATETIME
df.to_csv(parts[0] + '_ip_output.csv',header=False, index=False, quoting=None)
# CAPTURES THE END TIME OF THE OVERALL RUN FOR THE PURPOSE OF CALCULATING THE DURATION
end = time.time()
# PRINTS TO SCREEN THE RUNTIME OF THE PROCESS FOR PERFORMANCE REPORTING
# print(end - start)
# print(fn)
# USED TO RETURN ROW COUNT
def mapcount(filename):
with open(filename, "r+") as f:
buf = mmap.mmap(f.fileno(), 0)
lines = 0
readline = buf.readline
while readline():
lines += 1
return lines
# PASSES VALUES TO RESULTS FILE FOR REPORTING
ti = end - start
df = pd.DataFrame([mapcount(fn)])
df[1] = math.ceil(ti)
df[2] = dt
if len(pType) > 0:
df[3] = pType
df.to_csv('scriptresults.csv', mode='a',index=False,header=False)
Posts: 170
Threads: 43
Joined: May 2019
Figured it out and seems to be working..
import os
import sys
import time
from datetime import datetime
import subprocess
from concurrent.futures import ThreadPoolExecutor
import pandas as pd
import mmap
import math
#pType = sys.argv[1]
pType = 'Register'
# ROUTINE TO PING LIST OF IP'S USING THREADING
def ping(ip):
return (
ip,
subprocess.run(
f"ping {ip} -n 1", stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
).returncode,
)
# ROUTINE TO OPEN THE APPROPRIATE TEXT FILES TO PROCESS THE IP LIST
def scan_files():
global fn
global directory
global parts
global source
for entry in os.scandir(directory):
if entry.is_file() and entry.name.endswith('.txt'):
if pType in entry.name:
pt = directory + '/' + entry.name
fn = pt
# THIS WILL PARSE OUT THE VALUE BEFORE THE FIRST '_' TO SET THE VARIABLE AND USE LATER FOR SAVING THE RESULTS.
parts = entry.name.split('_')
source = pd.read_csv(pt)
for ind in source.index:
yield source['IP'][ind]
# USED AS A GLOBAL VARIABLE TO GRAB THE FILE NAME AND USE IN FUNCTION TO RETURN ROW COUNT.
fn = ""
# DIRECTORY WHERE ALL IP LISTS RESIDE
directory = 'Lists'
# SETS TODAYS DATE TO CAPTURE AS THE "RUNDATE" AND SETS THE START TIME USED FOR CALCULATING THE RUN DURATION
now = datetime.now()
dt = now.strftime("%m/%d/%Y,%H:%M:%S")
start = time.time()
# THIS WILL EXECUTE THE PING FUNCTION IN THE NUMBER OF THREADS SPECIFIED
executor = ThreadPoolExecutor(125)
# WRITES THE COMPLETE LIST OF RESULTS TO THE DATAFRAME FOR SAVING
df = pd.DataFrame(executor.map(ping, list(scan_files())))
# ADDS A THIRD COLUMN WITH THE RUNDATE AND TIME TO THE DATAFRAME BEFORE SAVING
df[2] = dt
# MERGE SOURCE WITH RESULTS TO HAVE A RESULT FILE WITH STORE AND REGISTER INFORMATION
merged_results = pd.concat(
(source.set_index("IP"), df.set_index(0)), axis=1
).reset_index()
# SAVES THE RESULTS TO A CSV FILE FOR THE EXCEL FILE DASHBOARD
# IP, 0 = UP 1 = DOWN,RUN DATETIME
#df.to_csv(parts[0] + '_ip_output.csv',header=False, index=False, quoting=None)
merged_results.to_csv(parts[0] + '_ip_output.csv',header=False, index=False, quoting=None)
# CAPTURES THE END TIME OF THE OVERALL RUN FOR THE PURPOSE OF CALCULATING THE DURATION
end = time.time()
# PRINTS TO SCREEN THE RUNTIME OF THE PROCESS FOR PERFORMANCE REPORTING
# print(end - start)
# print(fn)
# USED TO RETURN ROW COUNT
def mapcount(filename):
with open(filename, "r+") as f:
buf = mmap.mmap(f.fileno(), 0)
lines = 0
readline = buf.readline
while readline():
lines += 1
return lines
# PASSES VALUES TO RESULTS FILE FOR REPORTING
ti = end - start
df = pd.DataFrame([mapcount(fn)])
df[1] = math.ceil(ti)
df[2] = dt
if len(pType) > 0:
df[3] = pType
df.to_csv('scriptresults.csv', mode='a',index=False,header=False)
Posts: 6,776
Threads: 20
Joined: Feb 2020
Oct-21-2023, 02:45 PM
(This post was last modified: Oct-21-2023, 02:46 PM by deanhystad.)
I think you can trim things down a bit. Instead of trying to stitch things together at the end, pass the store and register information through the ping function.
import random
import pandas as pd
from datetime import datetime
from concurrent.futures import ThreadPoolExecutor
from pathlib import Path
pType = "Register"
def ping(register_info):
"""Check if register is online.
register_info contains ip. store, register.
Returns dictionary {ip, store, register, register online status, datetime}.
"""
ip, store, register = register_info
return {
"IP": ip,
"Store": store,
"Register": register,
"Status": random.randint(0,1), # Replace this with ping status
"Datetime": datetime.now().strftime("%m/%d/%Y,%H:%M:%S")
}
def scan_files(directory="."):
"""Return rows from register files.
Each row contains IP, Store, Register.
"""
for source in Path(directory).glob(f"*{pType}*.csv"):
for _, row in pd.read_csv(source).iterrows():
yield row
with ThreadPoolExecutor(125) as executor:
results = pd.DataFrame(executor.map(ping, scan_files()))
print(results) This doesn't do any pinging, but other than that I think the results dataframe contains what you want.
registers.csv
Output: ip,store,register
121.11.7.203,7,203
121.11.7.204,7,204
121.11.7.205,7,205
121.11.7.206,7,206
121.11.7.207,7,207
121.11.7.208,7,208
121.11.7.209,7,209
121.11.7.210,7,210
Output: IP Store Register Status Datetime
0 121.11.7.203 7 203 1 10/21/2023,09:44:17
1 121.11.7.204 7 204 1 10/21/2023,09:44:17
2 121.11.7.205 7 205 1 10/21/2023,09:44:17
3 121.11.7.206 7 206 0 10/21/2023,09:44:17
4 121.11.7.207 7 207 1 10/21/2023,09:44:17
5 121.11.7.208 7 208 1 10/21/2023,09:44:17
6 121.11.7.209 7 209 1 10/21/2023,09:44:17
7 121.11.7.210 7 210 0 10/21/2023,09:44:17
|