Need to return 2 values from 1 DF that equals another DF

cubangt · Oct-20-2023, 02:54 PM

ok, so i have a script that pings all our registers in the company to check if they are online or offline.. this works great. that script works off a list of just ip's. Once the script runs, it generates a output file in csv with the ip, status, datetime. Currently this is being pulled into excel to create a dashboard. NOW that VP's and management is aware of this dashboard, they want access, but providing access to so many people and one excel file, the file is going to get manipulated and eventually break. So im working on a new script that i can do everything im already doing in excel, but within the script to then generate a static report that can be sent out to everyone and no worry about pivot tables being changed or excel file being updated or changed and causing issues.

So i have a dataframe that returns a list of the ip's with store # and register number (source data) in the new script, this will be used as the source for IP's to ping. So in excel im using vlookup, how can i accomplish the same thing and add the store # and Register # to my csv of results?

Here is my source data that will be used to pull the ip, but also contains the store and register that i need to include in my results.csv

# SOURCE DATAFRAME

Output:	IP	Store	Register
0	121.11.7.203	7	203
1	121.11.7.204	7	204
2	121.11.7.205	7	205
3	121.11.7.206	7	206
4	121.11.7.207	7	207
5	121.11.7.208	7	208
6	121.11.7.221	7	221
7	121.11.7.222	7	222
8	121.11.9.201	9	201
9	121.11.9.202	9	202
10	121.11.9.203	9	203
11	121.11.9.204	9	204

# RESULTS DATAFRAME THAT IS WRITTEN TO CSV

Output:	IP	Status	Datetime
0	121.11.7.203	0	10/18/2023,16:12:01
1	121.11.7.204	0	10/18/2023,16:12:01
2	121.11.7.205	0	10/18/2023,16:12:01
3	121.11.7.206	0	10/18/2023,16:12:01
4	121.11.7.207	0	10/18/2023,16:12:01
5	121.11.7.208	0	10/18/2023,16:12:01
6	121.11.7.221	1	10/18/2023,16:12:01
7	121.11.7.222	0	10/18/2023,16:12:01
8	121.11.9.201	0	10/18/2023,16:12:01
9	121.11.9.202	0	10/18/2023,16:12:01
10	121.11.9.203	0	10/18/2023,16:12:01
11	121.11.9.204	0	10/18/2023,16:12:01

Since the results are captured into a df as they are gathered, then written to csv, i would like to add the store and register into the results before writing the csv. How can i search the source for the ip and return the 2 additional values i want and add to the results df?

**deanhystad** · Oct-20-2023, 08:15 PM

It is easy to have pandas do the merging, but it is always easier to do things correctly from the start. Does the ping script have access to this?

Output:IP	Store	Register
0	121.11.7.203	7	203
1	121.11.7.204	7	204
2	121.11.7.205	7	205

If so, why doesn't it put the store and register information in the output CSV? If it doesn't have access to store and register information, why not?

Here is how you can merge your source and results dataframes.

import pandas as pd
from random import randint
from datetime import datetime


source = pd.read_csv("test.csv")
print(source)

result = pd.DataFrame(
    (
        {"IP": ip, "Status": randint(0, 1), "Datetime": datetime.now()}
        for ip in source["IP"].values[::-1]
    )
)

print(result)

merged_results = pd.concat(
    (source.set_index("IP"), result.set_index("IP")), axis=1
).reset_index()
print(merged_results)

Output:             IP  Store  Register
0  121.11.7.203      7       203
1  121.11.7.204      7       204
2  121.11.7.205      7       205
3  121.11.7.206      7       206
4  121.11.7.207      7       207
5  121.11.7.208      7       208
6  121.11.7.221      7       221
7  121.11.7.222      7       222
             IP  Status                   Datetime
0  121.11.7.222       0 2023-10-20 15:13:29.259966
1  121.11.7.221       0 2023-10-20 15:13:29.259966
2  121.11.7.208       1 2023-10-20 15:13:29.259966
3  121.11.7.207       0 2023-10-20 15:13:29.259966
4  121.11.7.206       1 2023-10-20 15:13:29.259966
5  121.11.7.205       0 2023-10-20 15:13:29.259966
6  121.11.7.204       0 2023-10-20 15:13:29.259966
7  121.11.7.203       0 2023-10-20 15:13:29.259966
             IP  Store  Register  Status                   Datetime
0  121.11.7.203      7       203       0 2023-10-20 15:13:29.259966
1  121.11.7.204      7       204       0 2023-10-20 15:13:29.259966
2  121.11.7.205      7       205       0 2023-10-20 15:13:29.259966
3  121.11.7.206      7       206       1 2023-10-20 15:13:29.259966
4  121.11.7.207      7       207       0 2023-10-20 15:13:29.259966
5  121.11.7.208      7       208       1 2023-10-20 15:13:29.259966
6  121.11.7.221      7       221       0 2023-10-20 15:13:29.259966
7  121.11.7.222      7       222       0 2023-10-20 15:13:29.259966

cubangt · Oct-20-2023, 08:25 PM

Great question, wasnt looking to make such a huge improvement / upgrade to the overall script, now that more and more people are seeing the dashboard, my manager/vp is asking for improvements, so im looking to revamp the whole thing..

So to answer your question, yes with the new script, i would like to use just 1 source file to use to ping, which like you said, contains the store and register number so the end result is just a csv with all the data.

So i would for sure like to provide the source file like this:

Output: IP  Store  Register
0  121.11.7.203      7       203
1  121.11.7.204      7       204
2  121.11.7.205      7       205
3  121.11.7.206      7       206
4  121.11.7.207      7       207
5  121.11.7.208      7       208
6  121.11.7.221      7       221
7  121.11.7.222      7       222

And get the results like your example:

Output:             IP  Store  Register  Status                   Datetime
0  121.11.7.203      7       203       0 2023-10-20 15:13:29.259966
1  121.11.7.204      7       204       0 2023-10-20 15:13:29.259966
2  121.11.7.205      7       205       0 2023-10-20 15:13:29.259966
3  121.11.7.206      7       206       1 2023-10-20 15:13:29.259966
4  121.11.7.207      7       207       0 2023-10-20 15:13:29.259966
5  121.11.7.208      7       208       1 2023-10-20 15:13:29.259966
6  121.11.7.221      7       221       0 2023-10-20 15:13:29.259966
7  121.11.7.222      7       222       0 2023-10-20 15:13:29.259966

So im going to play around with your example to see if i can make that happen. I BELIEVE the reason it was done the current way, was because it was easier to loop thru a text file of ip's (only 1 column) than it was to figure out how to from 3 columns..

Im still learning alot on python, so nothing like jumping right in and making it happen.

I appreciate the example and will test with my current files and script.

cubangt · Oct-21-2023, 05:33 AM

So the example works great, i even used my actual lists and returned as expected (in my test file)

But in my actual working script, im having issues trying to replace this function so that it produces the same output as before in order to allow the ping function to run against the ip list.

# ROUTINE TO OPEN THE APPROPRIATE TEXT FILES TO PROCESS THE IP LIST
def scan_files():
    global fn
    global directory
    global parts
    for entry in os.scandir(directory):
        if entry.is_file() and entry.name.endswith('.txt'):
            if pType in entry.name:
                pt = directory + '/' + entry.name
                fn = pt
                # THIS WILL PARSE OUT THE VALUE BEFORE THE FIRST '_' TO SET THE VARIABLE AND USE LATER FOR SAVING THE RESULTS.
                parts = entry.name.split('_')
                with open(pt) as file:
                    for ip in file:
                        yield ip.strip()

the entire currently working script is this:

import os
import sys
import time
from datetime import datetime
import subprocess
from concurrent.futures import ThreadPoolExecutor
import pandas as pd
import mmap
import math

#pType = sys.argv[1]
pType = 'Register'

# ROUTINE TO PING LIST OF IP'S USING THREADING
def ping(ip):
    return (
        ip,
        subprocess.run(
            f"ping {ip} -n 1", stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
        ).returncode,
    )

# ROUTINE TO OPEN THE APPROPRIATE TEXT FILES TO PROCESS THE IP LIST
def scan_files():
    global fn
    global directory
    global parts
    for entry in os.scandir(directory):
        if entry.is_file() and entry.name.endswith('.txt'):
            if pType in entry.name:
                pt = directory + '/' + entry.name
                fn = pt
                # THIS WILL PARSE OUT THE VALUE BEFORE THE FIRST '_' TO SET THE VARIABLE AND USE LATER FOR SAVING THE RESULTS.
                parts = entry.name.split('_')
                with open(pt) as file:
                    for ip in file:
                        yield ip.strip()
# USED AS A GLOBAL VARIABLE TO GRAB THE FILE NAME AND USE IN FUNCTION TO RETURN ROW COUNT.
fn = ""

# DIRECTORY WHERE ALL IP LISTS RESIDE
directory = 'Lists'

# SETS TODAYS DATE TO CAPTURE AS THE "RUNDATE" AND SETS THE START TIME USED FOR CALCULATING THE RUN DURATION
now = datetime.now()
dt = now.strftime("%m/%d/%Y,%H:%M:%S")
start = time.time()

# THIS WILL EXECUTE THE PING FUNCTION IN THE NUMBER OF THREADS SPECIFIED
executor = ThreadPoolExecutor(125)

# WRITES THE COMPLETE LIST OF RESULTS TO THE DATAFRAME FOR SAVING
df = pd.DataFrame(executor.map(ping, list(scan_files())))

# ADDS A THIRD COLUMN WITH THE RUNDATE AND TIME TO THE DATAFRAME BEFORE SAVING
df[2] = dt

# SAVES THE RESULTS TO A CSV FILE FOR THE EXCEL FILE DASHBOARD
# IP, 0 = UP 1 = DOWN,RUN DATETIME
df.to_csv(parts[0] + '_ip_output.csv',header=False, index=False, quoting=None)

# CAPTURES THE END TIME OF THE OVERALL RUN FOR THE PURPOSE OF CALCULATING THE DURATION
end = time.time()

# PRINTS TO SCREEN THE RUNTIME OF THE PROCESS FOR PERFORMANCE REPORTING
# print(end - start)
# print(fn)

# USED TO RETURN ROW COUNT
def mapcount(filename):
    with open(filename, "r+") as f:
        buf = mmap.mmap(f.fileno(), 0)
        lines = 0
        readline = buf.readline
        while readline():
            lines += 1
        return lines
    
# PASSES VALUES TO RESULTS FILE FOR REPORTING    
ti = end - start
df = pd.DataFrame([mapcount(fn)])
df[1] = math.ceil(ti)
df[2] = dt
if len(pType) > 0:
    df[3] = pType
df.to_csv('scriptresults.csv', mode='a',index=False,header=False)

cubangt · Oct-21-2023, 06:06 AM

Figured it out and seems to be working..

import os
import sys
import time
from datetime import datetime
import subprocess
from concurrent.futures import ThreadPoolExecutor
import pandas as pd
import mmap
import math

#pType = sys.argv[1]
pType = 'Register'

# ROUTINE TO PING LIST OF IP'S USING THREADING
def ping(ip):
    return (
        ip,
        subprocess.run(
            f"ping {ip} -n 1", stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
        ).returncode,
    )

# ROUTINE TO OPEN THE APPROPRIATE TEXT FILES TO PROCESS THE IP LIST
def scan_files():
    global fn
    global directory
    global parts
    global source
    for entry in os.scandir(directory):
        if entry.is_file() and entry.name.endswith('.txt'):
            if pType in entry.name:
                pt = directory + '/' + entry.name
                fn = pt
                # THIS WILL PARSE OUT THE VALUE BEFORE THE FIRST '_' TO SET THE VARIABLE AND USE LATER FOR SAVING THE RESULTS.
                parts = entry.name.split('_')
                source = pd.read_csv(pt)
                for ind in source.index:
                    yield source['IP'][ind]                

# USED AS A GLOBAL VARIABLE TO GRAB THE FILE NAME AND USE IN FUNCTION TO RETURN ROW COUNT.
fn = ""

# DIRECTORY WHERE ALL IP LISTS RESIDE
directory = 'Lists'

# SETS TODAYS DATE TO CAPTURE AS THE "RUNDATE" AND SETS THE START TIME USED FOR CALCULATING THE RUN DURATION
now = datetime.now()
dt = now.strftime("%m/%d/%Y,%H:%M:%S")
start = time.time()

# THIS WILL EXECUTE THE PING FUNCTION IN THE NUMBER OF THREADS SPECIFIED
executor = ThreadPoolExecutor(125)

# WRITES THE COMPLETE LIST OF RESULTS TO THE DATAFRAME FOR SAVING
df = pd.DataFrame(executor.map(ping, list(scan_files())))

# ADDS A THIRD COLUMN WITH THE RUNDATE AND TIME TO THE DATAFRAME BEFORE SAVING
df[2] = dt

# MERGE SOURCE WITH RESULTS TO HAVE A RESULT FILE WITH STORE AND REGISTER INFORMATION
merged_results = pd.concat(
    (source.set_index("IP"), df.set_index(0)), axis=1
).reset_index()

# SAVES THE RESULTS TO A CSV FILE FOR THE EXCEL FILE DASHBOARD
# IP, 0 = UP 1 = DOWN,RUN DATETIME
#df.to_csv(parts[0] + '_ip_output.csv',header=False, index=False, quoting=None)
merged_results.to_csv(parts[0] + '_ip_output.csv',header=False, index=False, quoting=None)

# CAPTURES THE END TIME OF THE OVERALL RUN FOR THE PURPOSE OF CALCULATING THE DURATION
end = time.time()

# PRINTS TO SCREEN THE RUNTIME OF THE PROCESS FOR PERFORMANCE REPORTING
# print(end - start)
# print(fn)

# USED TO RETURN ROW COUNT
def mapcount(filename):
    with open(filename, "r+") as f:
        buf = mmap.mmap(f.fileno(), 0)
        lines = 0
        readline = buf.readline
        while readline():
            lines += 1
        return lines
    
# PASSES VALUES TO RESULTS FILE FOR REPORTING    
ti = end - start
df = pd.DataFrame([mapcount(fn)])
df[1] = math.ceil(ti)
df[2] = dt
if len(pType) > 0:
    df[3] = pType
df.to_csv('scriptresults.csv', mode='a',index=False,header=False)

**deanhystad** · (This post was last modified: Oct-21-2023, 02:46 PM by deanhystad.)

I think you can trim things down a bit. Instead of trying to stitch things together at the end, pass the store and register information through the ping function.

import random
import pandas as pd
from datetime import datetime
from concurrent.futures import ThreadPoolExecutor
from pathlib import Path


pType = "Register"


def ping(register_info):
    """Check if register is online.
    register_info contains ip. store, register.
    Returns dictionary {ip, store, register, register online status, datetime}.
    """
    ip, store, register = register_info
    return {
        "IP": ip,
        "Store": store,
        "Register": register,
        "Status": random.randint(0,1),  # Replace this with ping status
        "Datetime": datetime.now().strftime("%m/%d/%Y,%H:%M:%S")
    }


def scan_files(directory="."):
    """Return rows from register files.
    Each row contains IP, Store, Register.
    """
    for source in Path(directory).glob(f"*{pType}*.csv"):
        for _, row in pd.read_csv(source).iterrows():
            yield row


with ThreadPoolExecutor(125) as executor:
    results = pd.DataFrame(executor.map(ping, scan_files()))
print(results)

This doesn't do any pinging, but other than that I think the results dataframe contains what you want.

registers.csv

Output:ip,store,register
121.11.7.203,7,203
121.11.7.204,7,204
121.11.7.205,7,205
121.11.7.206,7,206
121.11.7.207,7,207
121.11.7.208,7,208
121.11.7.209,7,209
121.11.7.210,7,210

Output:             IP  Store  Register  Status             Datetime
0  121.11.7.203      7       203       1  10/21/2023,09:44:17
1  121.11.7.204      7       204       1  10/21/2023,09:44:17
2  121.11.7.205      7       205       1  10/21/2023,09:44:17
3  121.11.7.206      7       206       0  10/21/2023,09:44:17
4  121.11.7.207      7       207       1  10/21/2023,09:44:17
5  121.11.7.208      7       208       1  10/21/2023,09:44:17
6  121.11.7.209      7       209       1  10/21/2023,09:44:17
7  121.11.7.210      7       210       0  10/21/2023,09:44:17

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[Solved]Return values from npyscreen	Extra	2	1,177	Oct-09-2022, 07:19 PM Last Post: Extra
	Parallelism with return values	Plexian	7	1,514	Aug-14-2022, 09:33 AM Last Post: Plexian
	please help with classes and return values	jamie_01	5	1,809	Jan-17-2022, 02:11 AM Last Post: menator01
	Need to parse a list of boolean columns inside a list and return true values	Python84	4	2,125	Jan-09-2022, 02:39 AM Last Post: Python84
	Function - Return multiple values	tester_V	10	4,472	Jun-02-2021, 05:34 AM Last Post: tester_V
	Function to return list of all the INDEX values of a defined ndarray?	pjfarley3	2	1,978	Jul-10-2020, 04:51 AM Last Post: pjfarley3
	What is the best way to return these 4 integer values?	Pedroski55	4	2,561	Apr-13-2020, 09:54 PM Last Post: Pedroski55
	Return values for use outside of function	willowman	1	1,688	Apr-13-2020, 07:00 AM Last Post: buran
	Return all Values which can divided by 9	lastyle	2	1,849	Mar-16-2020, 09:22 PM Last Post: lastyle
	Custom Function to Return Database Values	rm78	0	1,786	Sep-05-2019, 01:01 PM Last Post: rm78

Need to return 2 values from 1 DF that equals another DF

User Panel Messages

Announcements