Posts: 170
Threads: 43
Joined: May 2019
Is this possible, i am populating a dataframe with a few fields and thousands of rows that will be saved into a CSV. For a separate reason, i need certain records to be inserted into a different dataframe, this will also be saved in a CSV, but for different audience and different purpose, so id like to see if during the "addition" of records into the main df, i can filter and add certain records that meet a criteria into a different df.
Or would it be easier to create the master df, then apply a filter to that and build the new df based on the filter. This 2nd df will require some calculations and other things before saving to CSV, so thats the reason for asking if its best to do it while df#1 is being populated or wait until df#1 is built, then build df#2
Posts: 6,780
Threads: 20
Joined: Feb 2020
Posts: 170
Threads: 43
Joined: May 2019
Oct-22-2023, 04:08 AM
(This post was last modified: Oct-22-2023, 04:08 AM by cubangt.)
These are just the tests i have been playing around with to get familiar with the methods of getting the data and calculations.
This is not in a script, just the things i have been trying so i can try to incorporate into my script.
This was done prior to the other help you provided.
# THIS IS COMPLETE STORE REGISTER BREAKDOWN LIST (NEED TO LOOK AT UPDATING AUTOMATICALLY WHEN NEW STORES ARE ADDED)
# SOURCE DATA TO BE UPDATED WITH EVERY NEW STORE AND LIST OF REGISTERS
registerBreakdown = pd.read_csv("RegisterList1.csv",names=['IP','Store','Register'])
# THIS RETURNS REGISTER COUNT BY STORE LOCATION, USED FOR CALCULATIONS FOR DASHBOARD
#registercntbystore = registerBreakdown.groupby(['Store'])['Register'].count()
registercntbystore = registerBreakdown.groupby(['Store'])['Register'].count()
# RETURNS THE TOTAL COUNTS FOR THE 2 STATUS (UP=0 AND DOWN=1)
#statuscnt = statusResults[["Status"]].value_counts()
statuscnt = statusResults[["Status"]].value_counts()
dfstatuscnt = pd.DataFrame(statusResults[["Status"]].value_counts(), columns=['Count'])
# RETURNS THE TOTAL COUNTS FOR THE DOWN STATUS
statuscntDown = statusResults[statusResults["Status"] == True]
# RETURNS THE TOTAL COUNTS FOR THE UP STATUS
statuscntUp = statusResults[statusResults["Status"] == False]
# RETURNS THE TOTAL COUNT FROM THE REGISTER LIST USED TO PING(ALL REGISTERS BEING PINGED/CHECKED)
loccnt = len(registerBreakdown.index)
# RETURNS THE COUNT AS NUMBER FOR DOWN
ttlDown = statuscnt[1] #len(statuscntDown.index)
# RETURNS THE COUNT AS NUMBER FOR UP
ttlUp = statuscnt[0] #len(statuscntUp.index)
Posts: 170
Threads: 43
Joined: May 2019
What im trying to accomplish is to get a few calculation values returned to me.
I need to get a count of registers down by store and if the "%" is greater than "x%" then save that off into a new df
example:
Store 7 has a total of 8 registers, only 1 is down, that is 13% Not required in the df
Store 13 has a total of 15 registers, only 3 are down, but that is 20% which is over the % threshold so this would be added to the df
and so on.
Since the master result set that is saved into a csv contained the IP, Store, Register, Status and Datetime, the master df has everything needed to calculate the % by store.
My test from above using this line:
registercntbystore = registerBreakdown.groupby(['Store'])['Register'].count() Gives me a df with all stores and the total count by store.
Cant get my head wrapped around how to use everything together in order to get a df with ONLY records that are over the % threshold. The purpose is that this will need to also be saved into a CSV for our support team to work on or contact the stores to find out what is wrong.
Posts: 170
Threads: 43
Joined: May 2019
Oct-22-2023, 05:48 AM
(This post was last modified: Oct-22-2023, 05:49 AM by cubangt.)
Id like a dataframe with this information
Output: Store Down Up Grand Total Percentage
7 1 7 8 13%
9 1 10 11 9%
10 0 11 11 0%
12 2 11 13 15%
Posts: 6,780
Threads: 20
Joined: Feb 2020
Oct-22-2023, 03:30 PM
(This post was last modified: Oct-22-2023, 03:30 PM by deanhystad.)
Your choice. This example does the secondary report processing in the same script, but it is all post processing after gathering the data. You could just as easily move the post processing to a different script that reads in the ping results file.
import random
import pandas as pd
from datetime import datetime
from concurrent.futures import ThreadPoolExecutor
pType = "Register"
def ping(register_info):
"""Check if register is online.
register_info contains ip. store, register.
Returns dictionary {ip, store, register, register online status, datetime}.
"""
ip, store, register = register_info
return {
"IP": ip,
"Store": store,
"Register": register,
"Status": random.choice((0, 0, 0, 0, 0, 1)),
"Datetime": datetime.now().strftime("%m/%d/%Y,%H:%M:%S")
}
def scan_files(directory="."):
"""Make fake stores"""
for store in range(1, 10):
for register in range(1, random.randint(8, 15)):
yield (f"123.456.{store}.{register}", store, register)
# Get all the ping info
with ThreadPoolExecutor(125) as executor:
results = pd.DataFrame(executor.map(ping, scan_files()))
# Compute what % of registers are offline
by_store = results.groupby("Store")
status = by_store["Status"]
report = pd.concat([status.size(), status.sum()], axis=1).reset_index()
report.columns = ["Store", "Register Count", "Down"]
report["Percentage"] = report["Down"] / report["Register Count"] * 100
print("Register Status Report", report, sep="\n")
# Get list of stores with more than 20% of registers offline
bad_stores = report[report["Percentage"] > 20]["Store"].values
print("\nStores where Registers Offline > 20%", report[report["Store"].isin(bad_stores)], sep="\n")
for store in bad_stores:
print(results.loc[by_store.groups[store]].reset_index(drop=True))
Posts: 170
Threads: 43
Joined: May 2019
I want to better understand the example you provided.. without providing more examples.. Why am i not able to get the calculations of the registers that are up? What portion of the logic should i focus on within the example portion of this "# Compute what % of registers are offline"
I tried adding a "Up" column and it complained about "Length mismatch: Expected axis has 3 elements, new values have 4 elements"
I tried to create a new set of logic and just renamed the variables and status to Up, but no luck.
I appreciate all the help, but id like to see if i can figure it out, but need a little push on where to focus in this logic:
# Compute what % of registers are online
by_store = statusResults.groupby("Store")
statusUp = by_store["Status"]
report = pd.concat([statusUp.size(), statusUp.sum()], axis=1).reset_index()
report.columns = ["Store", "Register Count", "Up"]
report["Up"] = report["Up"] / report["Register Count"] * 100
print("Register Status Report Up", report, sep="\n")
Posts: 170
Threads: 43
Joined: May 2019
Disregard my above question, i was overthinking it and figured out it will work as provided. thank you again..
Trying to incorporate into my script now, to see what i can complete this weekend.
Posts: 170
Threads: 43
Joined: May 2019
Here is my test script updated and working as expected. Just need to replace the "statusResults" dataframe with the dynamically generated one in the main script, but i feel good about this working in production.
# THIS IS TEMPORARY UNTIL MOVED TO REAL SCRIPT(THIS WILL BE THE PING RESULTS BEFORE WRITING TO CSV)
statusResults = pd.read_csv("Register_ip_output_20231021_233512.csv",names=['IP', 'Store', 'Register','Status', 'Datetime'])
# Compute what % of registers are offline
by_store = statusResults.groupby("Store")
status = by_store["Status"]
report = pd.concat([status.size(), status.sum()], axis=1).reset_index()
report.columns = ["Store", "Register Count", "Percentage"]
report["Percentage"] = report["Percentage"] / report["Register Count"] * 100
statuscntUp = len(statusResults[statusResults["Status"] == False])
# USED TO CALCULATE NUMBER OF STORES WITH A % OF REGISTERS DOWN
percentDown = len(report[report["Percentage"] != 0].index)
# RETURNS THE TOTAL NUMBER OF REGISTERS CHECKED
RegisterCount = report["Register Count"].sum()
# TOTAL CHAIN % OF REGISTERS UP
TotalPercentUp = str(int(statuscntUp / RegisterCount * 100)) + "%"
# LIST OF STORES FOR DISPLAYING ON SCREEN(FORMATTED)
TwentyStores = ','.join(str(v) for v in report[(report["Percentage"] > 20) & (report["Percentage"] < 50)]["Store"].values)
# GETS COUNT OF STORES WITH MORE THAN 20%
TwentyPercent = len(report[(report["Percentage"] > 20) & (report["Percentage"] < 50)]["Store"].values)
# GETS COUNT OF STORES WITH MORE THAN 50%
FiftyPercent = len(report[report["Percentage"] >= 50]["Store"].values)
# LIST OF STORES FOR DISPLAYING ON SCREEN(FORMATTED)
FiftyStores = ','.join(str(v) for v in report[report["Percentage"] >= 50]["Store"].values)
|