Python Forum

Full Version: How to sort .csv file test log which item first fail and paint color
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
Hi all,
I want to sort csv file find which item was failed(if this column have limit)and paint color for the cell, delete afterward item such as final data.xlsx format. I only know how to sort failed and don't know how to compared limit, kindly help me for the matter, thanks!!

import csv


array=[]
failed=[]
count=0

with open('Log.csv','r') as fp:
         csv_reader=csv.reader(fp)
         header = next(csv_reader)
         listreport=list(csv_reader)
         for line in listreport:
             if count<3:
                array.append(line)
             if line[5]=='Failed':
                 failed.append(line)
             count+=1

with open('new_name.csv','w',encoding='utf-8',newline='') as new_line:
        csv_writer=csv.writer(new_line)
        csv_writer.writerow(header)
        csv_writer.writerows(array)
        csv_writer.writerows(failed)
You are working with a large amount of data, which is creating huge lists (len = 385), much of the data is duplicated, within the lists, which is very confusing.

Also, I'm unsure why you are creating two objects: header and listreport. You could do the exact same thing with one object (the iterator in the for loop), which I've named row.

with open('Log.csv') as log:
   csv_reader = csv.reader(log)
   for row in csv_reader:
      if count<3:
         array.append(row)
      if row[5]=='Failed':
         failed.append(row)
      count+=1
I'm still unclear about your objective, as your English is a little hard for me to follow; I can see that you are doing your best, so the fail here is more on my side than yours.

I'll have a think about this and try to offer better help, but if you can find a way to help me better understand your objective, then please do.

edit to add:

I think that the major issue here is that you're trying to do too much at once: the task needs to be broken down into smaller steps (sub-tasks, if you will), solving one thing at a time, each step moving ever closer to the solution.
Thank you for your prompt reply, this is test log of product measured. For the file I want to two step.
1)Sort failed item.-->Already done.
2)Check column data whether in limit if assign limit such in Log.csv Column M(VDD Continuity) limit is -0.8 to 0.25 and then select that column compare value if in range and then check next(LR Continuity) have limit column does same check, if out range paint the cell color and skip this row, thanks!
You are very welcome.

I have worked with Excel for many years and (from my experience) I see no reason for using Python to solve this problem. All the tools you need (the functions) are already there, in Excel, such as Conditional Formatting, Logical, Search, etc, which can be used to reference any cell in any worksheet.

By trying to use Python, you now need to solve two problems, rather than just the one.
Thanks in advance!

Yes ,I have try use excel conditional formatting between function, many column need to compared by manually. It is a large workload, so want to try use python easy handle this matter.
It's not so easy because you need to know how csv.reader works and you need to convert str into float for example.
Otherwise you can't do the comparison.

Here an example with:
  • skipping first line before using csv.DictReader
  • DictReader gets automatically the first list (after skipping, it's the second line) for the column names, which are later the keys
  • reading two rows, where the first one should be the minimum limits and the second the maximum limits
  • converting this str to a float and putting them as attribute on the instance of Reader
  • iterate through the remaining data and using the methods reader.test_pass(row) and reader.in_valid_range(row)

Code:
import csv
from operator import itemgetter


class Reader:
    def __init__(self, csv_file, sort_fields):
        self.fd = open(csv_file, newline="")
        self.sort_fields = itemgetter(*sort_fields)
        # self.sort_fields(row) -> return a list
        # with the wanted data from the mapping

        # skipping first line
        next(self.fd)

        # DictReader reads automatically the first line
        # to get the column names
        self.csv = csv.DictReader(self.fd, delimiter=",")

        # getting minium and maximum values
        self.minimum_values = next(self.csv)
        self.maximum_values = next(self.csv)

        # getting the limit values from the dict and converting them to a float
        # that they could used for comparison
        self.vdd_continuity_min = float(self.minimum_values["VDD Continuity"])
        self.vdd_continuity_max = float(self.maximum_values["VDD Continuity"])
        self.lr_continuity_min = float(self.minimum_values["LR Continuity"])
        self.lr_continuity_max = float(self.maximum_values["LR Continuity"])

    @staticmethod
    def sort_by_float(row):
        # calling self.sort_fields with row -> data
        # map takes each element from data and call it with float()
        # tuple consumes the map
        # a tuple with floats is returned
        return tuple(map(float, self.sort_fields(row)))

    def __iter__(self):
        yield from self.csv

    def __enter__(self):
        return self

    def __exit__(self, exc_typ, exc_obj, exc_tb):
        # closes files automatically when leaving
        # the context manager
        self.fd.close()

    @staticmethod
    def test_pass(row):
        # no instance (self) is needed to do the test
        return row.get("Pass/Fail", "").lower() == "passed"

    def in_valid_range(self, row):
        # here self is needed, because attributes from this
        # instance are read
        vdd = float(row["VDD Continuity"])
        lr = float(row["LR Continuity"])
        return (self.vdd_continuity_min <= lr <= self.vdd_continuity_max) and (
            self.lr_continuity_min <= lr <= self.lr_continuity_max
        )


if __name__ == "__main__":
    with Reader("Log.csv", sort_fields=["VDD Continuity", "Temperature"]) as reader:
        # reader instance has now some attributes
        print(f"{reader.vdd_continuity_min=}")
        print(f"{reader.vdd_continuity_max=}")
        print(f"{reader.lr_continuity_min=}")
        print(f"{reader.lr_continuity_max=}")

        # calls implicity the __iter__ method of the instance of Reader
        for row in reader:
            if reader.test_pass(row) and reader.in_valid_range(row):
                part_id = row.get("Part ID")
                if part_id:
                    print(part_id)
The output could be made with: https://www.geeksforgeeks.org/working-wi...in-python/
But working directly with Excel will solve your task faster (currently). Later if you more experienced, this code is nothing difficult to write.

EDIT: Added the sorting. It's not possible to use this method, if the file is the half of your RAM.
FYI: you can convert to pandas dataframe and manipulate that.
This may ( or may not be useful in this instance )
import pandas as pd
import os


def display_log(filename):
    df = pd.read_csv(filename)
    print(df)

def main():
    # this code changes CWD to same directory as python script
    os.chdir(os.path.abspath(os.path.dirname(__file__)))

    display_log('Log.csv')

if __name__ == '__main__':
    main()
which produces (partial):
Output:
Unnamed: 0 Unnamed: 1 Unnamed: 2 ... 5045.1 5047 5047.1 0 TesterID Batch Number Site Number ... DUT_LM_relativeShock_PAT_Hi DUT_LM_Sens_Corr1000Hz_94SPL__PAT_Lo DUT_LM_Sens_Corr1000Hz_94SPL__PAT_Hi 1 NaN NaN NaN ... -0.053597136 -38.80975306 -38.80975306 2 NaN NaN NaN ... 0.112112568 -37.5058393 -37.5058393 3 Tester#2 1 3 ... 0.107265399 -38.83249426 -37.45496388 4 Tester#2 1 4 ... 0.112704539 -38.82323609 -37.47194499 5 Tester#2 1 5 ... 0.109072175 -38.82126228 -37.49127838 6 Tester#2 1 6 ... 0.111968008 -38.83483278 -37.47525095 7 Tester#2 1 7 ... NaN NaN NaN 8 Tester#2 1 8 ... 0.121047481 -38.85764922 -37.44677698 9 Tester#2 1 9 ... 0.107312512 -38.84481032 -37.48223468 10 Tester#2 1 10 ... 0.113937343 -38.81981198 -37.49873383 11 Tester#2 1 11 ... 0.105469581 -38.83446045 -37.49203972 12 Tester#2 1 13 ... 0.112050497 -38.83122185 -37.47174578 13 Tester#2 1 14 ... 0.113251309 -38.81597385 -37.46724701 14 Tester#2 1 15 ... 0.107133 -38.87445237 -37.44735208 15 Tester#2 1 17 ... 0.106066352 -38.82747034 -37.46790153 16 Tester#2 1 18 ... 0.111913245 -38.84638211 -37.47201892 17 Tester#2 1 19 ... 0.105938991 -38.83105884 -37.45517457 18 Tester#2 1 20 ... 0.11032486 -38.86610586 -37.46020799 19 Tester#2 1 22 ... 0.108430019 -38.83336922 -37.48581459 20 Tester#2 1 23 ... 0.111608632 -38.8631526 -37.47320664 21 Tester#2 1 24 ... 0.11198719 -38.85636541 -37.49164502 22 Tester#2 1 26 ... 0.118748873 -38.81289836 -37.51235239 23 Tester#2 1 27 ... 0.116329872 -38.83398578 -37.47402664 24 Tester#2 1 28 ... 0.111891007 -38.82510794 -37.46041127 25 Tester#2 1 29 ... 0.112180879 -38.83137426 -37.47103944 26 Tester#2 1 30 ... 0.115581321 -38.83673984 -37.49914098 27 Tester#2 1 31 ... 0.111984757 -38.84949218 -37.45391246 28 Tester#2 1 32 ... 0.114766975 -38.84627817 -37.46742165 29 Tester#2 1 33 ... 0.116238335 -38.82761274 -37.47276069 30 Tester#2 1 34 ... 0.11051097 -38.84194923 -37.44687194 31 Tester#2 2 0 ... 0.112107334 -38.80890703 -37.50574658 32 Tester#2 2 1 ... 0.114155945 -38.83613146 -37.49142007 33 Tester#2 2 2 ... NaN NaN NaN 34 Tester#2 2 3 ... NaN NaN NaN 35 Tester#2 2 4 ... 0.112702549 -38.8238681 -37.47156012 36 Tester#2 2 5 ... 0.109082816 -38.82111536 -37.49128727 37 Tester#2 2 6 ... 0.11193 -38.83687834 -37.47190235 38 Tester#2 2 7 ... 0.110769315 -38.81795084 -37.48783699 39 Tester#2 2 8 ... 0.120575644 -38.85765686 -37.44676501 40 Tester#2 2 9 ... 0.107202601 -38.84448086 -37.48138477 41 Tester#2 2 10 ... 0.113890241 -38.82515551 -37.49137495 42 Tester#2 2 11 ... 0.105438307 -38.83445528 -37.49161599 43 Tester#2 2 13 ... NaN NaN NaN 44 Tester#2 2 14 ... 0.113201626 -38.81667164 -37.46618239 45 Tester#2 2 15 ... 0.107045831 -38.87418522 -37.44798724 46 Tester#2 2 17 ... 0.105892395 -38.82975426 -37.46608974 47 Tester#2 2 18 ... 0.111842669 -38.84624438 -37.47263779 48 Tester#2 2 19 ... 0.106001641 -38.83279308 -37.45250412 49 Tester#2 2 20 ... 0.110353187 -38.86689691 -37.4586455 50 Tester#2 2 22 ... 0.108419992 -38.83444891 -37.4837277 51 Tester#2 2 23 ... 0.111471083 -38.86236253 -37.47474858 52 Tester#2 2 24 ... 0.112015322 -38.8547756 -37.49260473 53 Tester#2 3 3 ... NaN NaN NaN
Pandas feels for me like numpy + black magic.

Pandas is powerful, but what I see is, that people without knowledge about the Core Language Python itself trying to work with this framework.
It appreciated for your reply, I just a newbie in Python.
import csv

up_limit=[]
low_limit=[]
dt=[]
index_number=[12,13,14,15,22,26,32,41,42,44,45,47,48,49,50,51,52,53,54,55,56,57,58,60,62,64,66,74,80,85,86,91,92,97,98,102,
              104,108,110,114,116,120,122,126,128,132,134,138,140,144,146,152,173,177,204,205,206,207,208,210,212,214,215,216,
              218,239,240,241,242,243,247,248,263,264,270,271,275,277,281,283,292,299,301,304,306,309,310,311,312,313,314,315,
              316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,
              344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,
              372,373,374,375,376,377,378,379,380,381,382,383,384]

with open('Log.csv','r') as file:
   readerfile=csv.reader(file)
   for i, rows in enumerate(readerfile):
      if i==2 :
            for j in range(len(index_number)):
                col=rows[index_number[j]]
                up_limit.append(float(col))
      elif i==3:
            for j in range(len(index_number)):
              col = rows[index_number[j]]
              low_limit.append(float(col))
    file.close()


with open('Log.csv', 'r') as fp:
    readerfile = csv.reader(fp)
    header0=next(readerfile)
    header1=next(readerfile)
    header2=next(readerfile)
    header3=next(readerfile)
    data=list(readerfile)
    data.sort(key=lambda row:(int(row[2])))
    for row in data:
        if row[5]=='Failed':
            array.append(row)
    for i,rows in enumerate(array):
        for j in range(len(index_number)):
            col_value = rows[index_number[j]]
            if num<len(index_number):
                    if float(col_value)>=up_limit[num] and float(col_value)<=low_limit[num]:
                             dt.append(col_value)
                             num += 1
                    else:
                        num=0
                        break
with open('sorted_data.csv','w',encoding='utf-8',newline='') as new:
    csvwriter=csv.writer(new)
    csvwriter.writerow(header0)
    csvwriter.writerow(header1)
    csvwriter.writerow(header2)
    csvwriter.writerow(header3)
    csvwriter.writerows(dt)
I have updated the code which can read up&low limit compare by row.Why final generated file format is wrong?The sorted_data.csv is what I want.
Pages: 1 2 3