Python Forum
Need to return 2 values from 1 DF that equals another DF
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need to return 2 values from 1 DF that equals another DF
#1
ok, so i have a script that pings all our registers in the company to check if they are online or offline.. this works great. that script works off a list of just ip's. Once the script runs, it generates a output file in csv with the ip, status, datetime. Currently this is being pulled into excel to create a dashboard. NOW that VP's and management is aware of this dashboard, they want access, but providing access to so many people and one excel file, the file is going to get manipulated and eventually break. So im working on a new script that i can do everything im already doing in excel, but within the script to then generate a static report that can be sent out to everyone and no worry about pivot tables being changed or excel file being updated or changed and causing issues.

So i have a dataframe that returns a list of the ip's with store # and register number (source data) in the new script, this will be used as the source for IP's to ping. So in excel im using vlookup, how can i accomplish the same thing and add the store # and Register # to my csv of results?

Here is my source data that will be used to pull the ip, but also contains the store and register that i need to include in my results.csv

# SOURCE DATAFRAME
Output:
IP Store Register 0 121.11.7.203 7 203 1 121.11.7.204 7 204 2 121.11.7.205 7 205 3 121.11.7.206 7 206 4 121.11.7.207 7 207 5 121.11.7.208 7 208 6 121.11.7.221 7 221 7 121.11.7.222 7 222 8 121.11.9.201 9 201 9 121.11.9.202 9 202 10 121.11.9.203 9 203 11 121.11.9.204 9 204
# RESULTS DATAFRAME THAT IS WRITTEN TO CSV
Output:
IP Status Datetime 0 121.11.7.203 0 10/18/2023,16:12:01 1 121.11.7.204 0 10/18/2023,16:12:01 2 121.11.7.205 0 10/18/2023,16:12:01 3 121.11.7.206 0 10/18/2023,16:12:01 4 121.11.7.207 0 10/18/2023,16:12:01 5 121.11.7.208 0 10/18/2023,16:12:01 6 121.11.7.221 1 10/18/2023,16:12:01 7 121.11.7.222 0 10/18/2023,16:12:01 8 121.11.9.201 0 10/18/2023,16:12:01 9 121.11.9.202 0 10/18/2023,16:12:01 10 121.11.9.203 0 10/18/2023,16:12:01 11 121.11.9.204 0 10/18/2023,16:12:01
Since the results are captured into a df as they are gathered, then written to csv, i would like to add the store and register into the results before writing the csv. How can i search the source for the ip and return the 2 additional values i want and add to the results df?
Reply
#2
It is easy to have pandas do the merging, but it is always easier to do things correctly from the start. Does the ping script have access to this?
Output:
IP Store Register 0 121.11.7.203 7 203 1 121.11.7.204 7 204 2 121.11.7.205 7 205
If so, why doesn't it put the store and register information in the output CSV? If it doesn't have access to store and register information, why not?

Here is how you can merge your source and results dataframes.
import pandas as pd
from random import randint
from datetime import datetime


source = pd.read_csv("test.csv")
print(source)

result = pd.DataFrame(
    (
        {"IP": ip, "Status": randint(0, 1), "Datetime": datetime.now()}
        for ip in source["IP"].values[::-1]
    )
)

print(result)

merged_results = pd.concat(
    (source.set_index("IP"), result.set_index("IP")), axis=1
).reset_index()
print(merged_results)
Output:
IP Store Register 0 121.11.7.203 7 203 1 121.11.7.204 7 204 2 121.11.7.205 7 205 3 121.11.7.206 7 206 4 121.11.7.207 7 207 5 121.11.7.208 7 208 6 121.11.7.221 7 221 7 121.11.7.222 7 222 IP Status Datetime 0 121.11.7.222 0 2023-10-20 15:13:29.259966 1 121.11.7.221 0 2023-10-20 15:13:29.259966 2 121.11.7.208 1 2023-10-20 15:13:29.259966 3 121.11.7.207 0 2023-10-20 15:13:29.259966 4 121.11.7.206 1 2023-10-20 15:13:29.259966 5 121.11.7.205 0 2023-10-20 15:13:29.259966 6 121.11.7.204 0 2023-10-20 15:13:29.259966 7 121.11.7.203 0 2023-10-20 15:13:29.259966 IP Store Register Status Datetime 0 121.11.7.203 7 203 0 2023-10-20 15:13:29.259966 1 121.11.7.204 7 204 0 2023-10-20 15:13:29.259966 2 121.11.7.205 7 205 0 2023-10-20 15:13:29.259966 3 121.11.7.206 7 206 1 2023-10-20 15:13:29.259966 4 121.11.7.207 7 207 0 2023-10-20 15:13:29.259966 5 121.11.7.208 7 208 1 2023-10-20 15:13:29.259966 6 121.11.7.221 7 221 0 2023-10-20 15:13:29.259966 7 121.11.7.222 7 222 0 2023-10-20 15:13:29.259966
Reply
#3
Big Grin Great question, wasnt looking to make such a huge improvement / upgrade to the overall script, now that more and more people are seeing the dashboard, my manager/vp is asking for improvements, so im looking to revamp the whole thing..

So to answer your question, yes with the new script, i would like to use just 1 source file to use to ping, which like you said, contains the store and register number so the end result is just a csv with all the data.

So i would for sure like to provide the source file like this:

Output:
IP Store Register 0 121.11.7.203 7 203 1 121.11.7.204 7 204 2 121.11.7.205 7 205 3 121.11.7.206 7 206 4 121.11.7.207 7 207 5 121.11.7.208 7 208 6 121.11.7.221 7 221 7 121.11.7.222 7 222
And get the results like your example:

Output:
IP Store Register Status Datetime 0 121.11.7.203 7 203 0 2023-10-20 15:13:29.259966 1 121.11.7.204 7 204 0 2023-10-20 15:13:29.259966 2 121.11.7.205 7 205 0 2023-10-20 15:13:29.259966 3 121.11.7.206 7 206 1 2023-10-20 15:13:29.259966 4 121.11.7.207 7 207 0 2023-10-20 15:13:29.259966 5 121.11.7.208 7 208 1 2023-10-20 15:13:29.259966 6 121.11.7.221 7 221 0 2023-10-20 15:13:29.259966 7 121.11.7.222 7 222 0 2023-10-20 15:13:29.259966
So im going to play around with your example to see if i can make that happen. I BELIEVE the reason it was done the current way, was because it was easier to loop thru a text file of ip's (only 1 column) than it was to figure out how to from 3 columns..

Im still learning alot on python, so nothing like jumping right in and making it happen.

I appreciate the example and will test with my current files and script.
Reply
#4
So the example works great, i even used my actual lists and returned as expected (in my test file)

But in my actual working script, im having issues trying to replace this function so that it produces the same output as before in order to allow the ping function to run against the ip list.

# ROUTINE TO OPEN THE APPROPRIATE TEXT FILES TO PROCESS THE IP LIST
def scan_files():
    global fn
    global directory
    global parts
    for entry in os.scandir(directory):
        if entry.is_file() and entry.name.endswith('.txt'):
            if pType in entry.name:
                pt = directory + '/' + entry.name
                fn = pt
                # THIS WILL PARSE OUT THE VALUE BEFORE THE FIRST '_' TO SET THE VARIABLE AND USE LATER FOR SAVING THE RESULTS.
                parts = entry.name.split('_')
                with open(pt) as file:
                    for ip in file:
                        yield ip.strip()
the entire currently working script is this:

import os
import sys
import time
from datetime import datetime
import subprocess
from concurrent.futures import ThreadPoolExecutor
import pandas as pd
import mmap
import math

#pType = sys.argv[1]
pType = 'Register'

# ROUTINE TO PING LIST OF IP'S USING THREADING
def ping(ip):
    return (
        ip,
        subprocess.run(
            f"ping {ip} -n 1", stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
        ).returncode,
    )

# ROUTINE TO OPEN THE APPROPRIATE TEXT FILES TO PROCESS THE IP LIST
def scan_files():
    global fn
    global directory
    global parts
    for entry in os.scandir(directory):
        if entry.is_file() and entry.name.endswith('.txt'):
            if pType in entry.name:
                pt = directory + '/' + entry.name
                fn = pt
                # THIS WILL PARSE OUT THE VALUE BEFORE THE FIRST '_' TO SET THE VARIABLE AND USE LATER FOR SAVING THE RESULTS.
                parts = entry.name.split('_')
                with open(pt) as file:
                    for ip in file:
                        yield ip.strip()
# USED AS A GLOBAL VARIABLE TO GRAB THE FILE NAME AND USE IN FUNCTION TO RETURN ROW COUNT.
fn = ""

# DIRECTORY WHERE ALL IP LISTS RESIDE
directory = 'Lists'

# SETS TODAYS DATE TO CAPTURE AS THE "RUNDATE" AND SETS THE START TIME USED FOR CALCULATING THE RUN DURATION
now = datetime.now()
dt = now.strftime("%m/%d/%Y,%H:%M:%S")
start = time.time()

# THIS WILL EXECUTE THE PING FUNCTION IN THE NUMBER OF THREADS SPECIFIED
executor = ThreadPoolExecutor(125)

# WRITES THE COMPLETE LIST OF RESULTS TO THE DATAFRAME FOR SAVING
df = pd.DataFrame(executor.map(ping, list(scan_files())))

# ADDS A THIRD COLUMN WITH THE RUNDATE AND TIME TO THE DATAFRAME BEFORE SAVING
df[2] = dt

# SAVES THE RESULTS TO A CSV FILE FOR THE EXCEL FILE DASHBOARD
# IP, 0 = UP 1 = DOWN,RUN DATETIME
df.to_csv(parts[0] + '_ip_output.csv',header=False, index=False, quoting=None)

# CAPTURES THE END TIME OF THE OVERALL RUN FOR THE PURPOSE OF CALCULATING THE DURATION
end = time.time()

# PRINTS TO SCREEN THE RUNTIME OF THE PROCESS FOR PERFORMANCE REPORTING
# print(end - start)
# print(fn)

# USED TO RETURN ROW COUNT
def mapcount(filename):
    with open(filename, "r+") as f:
        buf = mmap.mmap(f.fileno(), 0)
        lines = 0
        readline = buf.readline
        while readline():
            lines += 1
        return lines
    
# PASSES VALUES TO RESULTS FILE FOR REPORTING    
ti = end - start
df = pd.DataFrame([mapcount(fn)])
df[1] = math.ceil(ti)
df[2] = dt
if len(pType) > 0:
    df[3] = pType
df.to_csv('scriptresults.csv', mode='a',index=False,header=False)  
Reply
#5
Figured it out and seems to be working..
import os
import sys
import time
from datetime import datetime
import subprocess
from concurrent.futures import ThreadPoolExecutor
import pandas as pd
import mmap
import math

#pType = sys.argv[1]
pType = 'Register'

# ROUTINE TO PING LIST OF IP'S USING THREADING
def ping(ip):
    return (
        ip,
        subprocess.run(
            f"ping {ip} -n 1", stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
        ).returncode,
    )

# ROUTINE TO OPEN THE APPROPRIATE TEXT FILES TO PROCESS THE IP LIST
def scan_files():
    global fn
    global directory
    global parts
    global source
    for entry in os.scandir(directory):
        if entry.is_file() and entry.name.endswith('.txt'):
            if pType in entry.name:
                pt = directory + '/' + entry.name
                fn = pt
                # THIS WILL PARSE OUT THE VALUE BEFORE THE FIRST '_' TO SET THE VARIABLE AND USE LATER FOR SAVING THE RESULTS.
                parts = entry.name.split('_')
                source = pd.read_csv(pt)
                for ind in source.index:
                    yield source['IP'][ind]                

# USED AS A GLOBAL VARIABLE TO GRAB THE FILE NAME AND USE IN FUNCTION TO RETURN ROW COUNT.
fn = ""

# DIRECTORY WHERE ALL IP LISTS RESIDE
directory = 'Lists'

# SETS TODAYS DATE TO CAPTURE AS THE "RUNDATE" AND SETS THE START TIME USED FOR CALCULATING THE RUN DURATION
now = datetime.now()
dt = now.strftime("%m/%d/%Y,%H:%M:%S")
start = time.time()

# THIS WILL EXECUTE THE PING FUNCTION IN THE NUMBER OF THREADS SPECIFIED
executor = ThreadPoolExecutor(125)

# WRITES THE COMPLETE LIST OF RESULTS TO THE DATAFRAME FOR SAVING
df = pd.DataFrame(executor.map(ping, list(scan_files())))

# ADDS A THIRD COLUMN WITH THE RUNDATE AND TIME TO THE DATAFRAME BEFORE SAVING
df[2] = dt

# MERGE SOURCE WITH RESULTS TO HAVE A RESULT FILE WITH STORE AND REGISTER INFORMATION
merged_results = pd.concat(
    (source.set_index("IP"), df.set_index(0)), axis=1
).reset_index()

# SAVES THE RESULTS TO A CSV FILE FOR THE EXCEL FILE DASHBOARD
# IP, 0 = UP 1 = DOWN,RUN DATETIME
#df.to_csv(parts[0] + '_ip_output.csv',header=False, index=False, quoting=None)
merged_results.to_csv(parts[0] + '_ip_output.csv',header=False, index=False, quoting=None)

# CAPTURES THE END TIME OF THE OVERALL RUN FOR THE PURPOSE OF CALCULATING THE DURATION
end = time.time()

# PRINTS TO SCREEN THE RUNTIME OF THE PROCESS FOR PERFORMANCE REPORTING
# print(end - start)
# print(fn)

# USED TO RETURN ROW COUNT
def mapcount(filename):
    with open(filename, "r+") as f:
        buf = mmap.mmap(f.fileno(), 0)
        lines = 0
        readline = buf.readline
        while readline():
            lines += 1
        return lines
    
# PASSES VALUES TO RESULTS FILE FOR REPORTING    
ti = end - start
df = pd.DataFrame([mapcount(fn)])
df[1] = math.ceil(ti)
df[2] = dt
if len(pType) > 0:
    df[3] = pType
df.to_csv('scriptresults.csv', mode='a',index=False,header=False)  
Reply
#6
I think you can trim things down a bit. Instead of trying to stitch things together at the end, pass the store and register information through the ping function.
import random
import pandas as pd
from datetime import datetime
from concurrent.futures import ThreadPoolExecutor
from pathlib import Path


pType = "Register"


def ping(register_info):
    """Check if register is online.
    register_info contains ip. store, register.
    Returns dictionary {ip, store, register, register online status, datetime}.
    """
    ip, store, register = register_info
    return {
        "IP": ip,
        "Store": store,
        "Register": register,
        "Status": random.randint(0,1),  # Replace this with ping status
        "Datetime": datetime.now().strftime("%m/%d/%Y,%H:%M:%S")
    }


def scan_files(directory="."):
    """Return rows from register files.
    Each row contains IP, Store, Register.
    """
    for source in Path(directory).glob(f"*{pType}*.csv"):
        for _, row in pd.read_csv(source).iterrows():
            yield row


with ThreadPoolExecutor(125) as executor:
    results = pd.DataFrame(executor.map(ping, scan_files()))
print(results)
This doesn't do any pinging, but other than that I think the results dataframe contains what you want.

registers.csv
Output:
ip,store,register 121.11.7.203,7,203 121.11.7.204,7,204 121.11.7.205,7,205 121.11.7.206,7,206 121.11.7.207,7,207 121.11.7.208,7,208 121.11.7.209,7,209 121.11.7.210,7,210
Output:
IP Store Register Status Datetime 0 121.11.7.203 7 203 1 10/21/2023,09:44:17 1 121.11.7.204 7 204 1 10/21/2023,09:44:17 2 121.11.7.205 7 205 1 10/21/2023,09:44:17 3 121.11.7.206 7 206 0 10/21/2023,09:44:17 4 121.11.7.207 7 207 1 10/21/2023,09:44:17 5 121.11.7.208 7 208 1 10/21/2023,09:44:17 6 121.11.7.209 7 209 1 10/21/2023,09:44:17 7 121.11.7.210 7 210 0 10/21/2023,09:44:17
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [Solved]Return values from npyscreen Extra 2 1,177 Oct-09-2022, 07:19 PM
Last Post: Extra
  Parallelism with return values Plexian 7 1,514 Aug-14-2022, 09:33 AM
Last Post: Plexian
  please help with classes and return values jamie_01 5 1,809 Jan-17-2022, 02:11 AM
Last Post: menator01
  Need to parse a list of boolean columns inside a list and return true values Python84 4 2,125 Jan-09-2022, 02:39 AM
Last Post: Python84
  Function - Return multiple values tester_V 10 4,472 Jun-02-2021, 05:34 AM
Last Post: tester_V
  Function to return list of all the INDEX values of a defined ndarray? pjfarley3 2 1,978 Jul-10-2020, 04:51 AM
Last Post: pjfarley3
  What is the best way to return these 4 integer values? Pedroski55 4 2,561 Apr-13-2020, 09:54 PM
Last Post: Pedroski55
  Return values for use outside of function willowman 1 1,688 Apr-13-2020, 07:00 AM
Last Post: buran
  Return all Values which can divided by 9 lastyle 2 1,849 Mar-16-2020, 09:22 PM
Last Post: lastyle
  Custom Function to Return Database Values rm78 0 1,786 Sep-05-2019, 01:01 PM
Last Post: rm78

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020