How can I solve this file handling issue?

GiggsB · Feb-12-2022, 09:43 AM

(Feb-12-2022, 08:26 AM)stevendaprano Wrote: ....but I am 95% sure that struct.unpack will be faster.

Stevendaprano,

Actually, Yes that IS the case. With your modified code, I ran the tests and the code with bit shift still takes ~20 sec., however, with struct.unpack takes ~6.3 seconds. Not sure what mistake I did in my test to compare both strategies but you were right. Thank you!

However, there is some data loss. The file should contain data in 200,704 rows, but I receive 200,313 rows only (It is missing 391 values). This is not the issue with other strategy. Any idea why that might be the case? Added the code, just in case:

# Untested.
import datetime
import os
import struct
 
import pigpio
import spidev
 
# We only have SPI bus 0 available to us on the Pi
bus = 0
#Device is the chip select pin. Set to 0 or 1, depending on the connections
device = 0
# Enable SPI
spi = spidev.SpiDev()
# Open a connection to a specific bus and device (chip select pin)
spi.open(bus, device)
# Set SPI speed and mode
spi.max_speed_hz = 4000000
spi.mode = 0
 
pi = pigpio.pi()
pi.set_mode(25, pigpio.INPUT)
 
def output_file_path():
    return os.path.join(os.path.dirname(__file__),
               datetime.datetime.now().strftime("%dT%H.%M.%S") + ".csv")
 
input("Press Enter to start the process ")
print("SM1 Process started...")
spi.xfer2([0x01])
while True:
    if pi.wait_for_edge(25, pigpio.RISING_EDGE, 5.0):
        print("Detected")
        data = [0]*2048
         
        with open(output_file_path(), 'w') as f:
            t1=datetime.datetime.now()
            for x in range(392):
                #t3 = datetime.datetime.now()
                spi.xfer2(data)
                values = struct.unpack(">" +"I"*512, bytes(data))
                f.write('\n'.join([str(x) for x in values]))
            t2=datetime.datetime.now()
            print(t2-t1)
        break

stevendaprano · Feb-12-2022, 10:07 AM

(Feb-12-2022, 08:51 AM)GiggsB Wrote: bit shifting operation was faster than struct.unpack()

Showing us the results without showing us the code that generated the results is pointless.

Let's do this the right way, using the timeit module

import random
from timeit import Timer
data = [random.randint(0, 255) for i in range(2048)]

t1 = Timer('x = struct.unpack(">" +"I"*512, bytes(data))', setup="import struct; from __main__ import data")
t2 = Timer('''
x = []
for y in range(0, 2048, 4):
    x.append(data[y]<<24 | data[y+1]<<16 | data[y+2]<<8 | data[y+3])
''', setup="from __main__ import data")

print("struct.unpack", min(t1.repeat(number=10000, repeat=7)))
print("manual loop with bitshift", min(t2.repeat(number=10000, repeat=7)))

On my computer, the result is not even close: struct.unpack is about seven times faster than the manual loop. On a Raspberry Pi, your results might be different -- but I would be shocked if the manual loop was faster. The unpack version does most of the work at the speed of C, while the manual loop is doing everything in Python.

In any case, its not really important. Even the slow manual loop version should be fast enough, less than a millisecond. If your code is taking 20 seconds to collect data from the Pi and write it to a file, the bottleneck making it slow is not the part where you convert the list of 8-bit ints to 32-bit ints.

stevendaprano · Feb-12-2022, 10:22 AM

(Feb-12-2022, 09:43 AM)GiggsB Wrote: The file should contain data in 200,704 rows, but I receive 200,313 rows only (It is missing 391 values).

I've reached the end of help I can give, without being able to read the data from a Raspberry Pi. Sorry.

But reading the code, I cannot see how it is possible to have lost data.

GiggsB · Feb-12-2022, 10:43 AM

(Feb-12-2022, 10:07 AM)stevendaprano Wrote: Let's do this the right way, using the timeit module

Thank you, I used your code and the result is that struct.unpack() is 14 times faster than manual loop.

I had to change the "Repeat =7" to "Repeat=1" for this test since manual loop was taking too long to generate output. By the way, I was using this code to do the comparision:

import time
import struct
import datetime as datetime

data=[0]*4
data[0]=1
data[1]=2
data[2]=3
data[3]=4

print("Using bit shifting:")
t1=datetime.datetime.now()
value1=data[0]<<24 | data[1]<<16 | data[2]<<8 | data[3]
t2=datetime.datetime.now()
print(t2-t1)

print("Using struct.unpack:")
t3=datetime.datetime.now()
value2=struct.unpack("<I", bytearray(data))[0]
t4=datetime.datetime.now()
print(t4-t3)

And also, thank you so much for helping me all along. I will try to figure out the data loss problem myself and post the solution once I figure it out.

DeaD_EyE · (This post was last modified: Feb-12-2022, 12:48 PM by DeaD_EyE.)

Unpacking data with minimal effort:

from struct import Struct
from random import randint


# little endian, 512 unsigned integers (4 bytes per int; 32 bit)
DataFormat = Struct("<512I")  # no fancy multiplication of "I"


# simulated incoming random data
random_data = bytes(randint(0, 255) for _ in range(2048))

# decoded data
decoded = DataFormat.unpack(random_data)

You should read the format specification of struct:

The bit shifting is nice to learn more about low-level stuff, but later you want to use more high-level stuff where you can make lesser errors. Premature optimization is the root of all evil. If you think your code runs too slow, measure it. Then find the spot where your code runs slow. Then you can optimize. In the most cases of getting input from something, the speed of Python is fast enough to process it with high-level constructs like the Struct class. Calling the functions directly, without the class instance, does not show a measurable difference in Speed with Python 3.10 on my desktop. Older Python Versions may have more significant differences of calling functions directly or methods of instances.

The measurement of time could be done with time.perf_counter(), because this timer has a higher resolution and there is a guarantee, that this timer will never go backwards. With a context manager, it's like magic:

from contextlib import contextmanager


@contextmanager
def measure_time(results: list):
    """
    This context manager measures the time in the with-block
    and adds the time to the given list `results`
    """
    start = time.perf_counter()
    yield
    stop = time.perf_counter()
    results.append(stop - start)


# code which uses measure_time

delays = []

for _ in range(10):
    with measure_time(delays):
        time.sleep(1)

print(delays)

The benefit is, that the timing logic is separated from user code. This implementation is the simplest way.
Not to use the datetime module, saves the conversion back and forth.

A class could also attach the delays to the instance, but this requires the understanding of OOP and the Python object model.

import time


class MeasureTime:
    def __init__(self):
        self._delays = []
        self._last_start = 0.0

    def __enter__(self):
        """
        This context manager measures the time in the with-block
        and adds the time to delays
        """
        self._last_start = time.perf_counter()

    def __exit__(self, exception_type, exception_value, exception_traceback):
        self._delays.append(time.perf_counter() - self._last_start)

    @property
    def delays(self):
        """
        Write protected delays of last usages as a tuple.
        """
        return tuple(self._delays)

    def clear(self):
        """
        Delete all measured delays
        """
        self._delays.clear()


# later in code

measure_time = MeasureTime()

with measure_time:
    time.sleep(1)

with measure_time:
    time.sleep(0.001) # 1 ms

print(measure_time.delays)
print("Clearing delays")
measure_time.clear()

print("One single measurement")
with measure_time:
    time.sleep(0.005)

print(measure_time.delays)

PS: Please do not compare with timeit because this module repeats functions x times to get an average time and the standard deviation. The code I posted is used to measure how long something takes at individual points in the code.

GiggsB · Feb-12-2022, 08:03 PM

Hi DeaD_EyE,

Thanks for the detailed explanation about the time.perf_counter(). I also tried the

(Feb-12-2022, 12:48 PM)DeaD_EyE Wrote: Unpacking data with minimal effort:

way.
I had to change decoded = DataFormat.unpack(random_data) to decoded = DataFormat.unpack(bytes(random_data)). Please check my code below. This strategy also took ~6 seconds.
However, the problem of data loss also remained. 391 values were still missing.

import datetime
import os
from struct import  Struct
 
import pigpio
import spidev
 
# We only have SPI bus 0 available to us on the Pi
bus = 0
#Device is the chip select pin. Set to 0 or 1, depending on the connections
device = 0
# Enable SPI
spi = spidev.SpiDev()
# Open a connection to a specific bus and device (chip select pin)
spi.open(bus, device)
# Set SPI speed and mode
spi.max_speed_hz = 4000000
spi.mode = 0
 
pi = pigpio.pi()
pi.set_mode(25, pigpio.INPUT)
 
def output_file_path():
    return os.path.join(os.path.dirname(__file__),
               datetime.datetime.now().strftime("%dT%H.%M.%S") + ".csv")
 
input("Press Enter to start the process ")
print("SM1 Process started...")
spi.xfer2([0x01])

DataFormat = Struct(">512I")

while True:
    if pi.wait_for_edge(25, pigpio.RISING_EDGE, 5.0):
        print("Detected")
        data = [0]*2048
         
        with open(output_file_path(), 'w') as f:
            t1=datetime.datetime.now()
            for x in range(392):
                spi.xfer2(data)
                values=DataFormat.unpack(bytes(data))
                f.write('\n'.join([str(x) for x in values]))
            t2=datetime.datetime.now()
            print(t2-t1)
        break

Thanks.

stevendaprano · Feb-13-2022, 03:37 AM

(Feb-12-2022, 12:48 PM)DeaD_EyE Wrote: Unpacking data with minimal effort:
DataFormat = Struct("<512I") # no fancy multiplication of "I"

Ah good, I thought there had to be another way of doing that, thanks!

Quote:Premature optimization is the root of all evil.

Seconded!

Quote:The measurement of time could be done with time.perf_counter(), because this timer has a higher resolution and there is a guarantee, that this timer will never go backwards.
...
PS: Please do not compare with timeit because this module repeats functions x times to get an average time and the standard deviation. The code I posted is used to measure how long something takes at individual points in the code.

I disagree strongly with this, especially for short snippets of code that run very quickly.

Taking a single measurement of a very brief operation is one of the "common traps for measuring execution times" that the documentation talks about. See here, where Tim Peters (one of the Python demigods) details some of the problems with timing small code snippets.

That was written a long time ago, and fortunately the question of time versus clock has been solved with perf_counter, but the other issues are still relevant. If anything, things are even worse: your OS and computer are probably even busier now than they were in 2002 when Tim wrote his introduction, and the chances of running your code when the computer and CPU is quiet even less. The results are probably more variability today than ever before.

If you want to know how long an operation actually took this time you ran it, then your decorator is a great solution. A decade ago I wrote this context manager that does something very similar.

But if you want to know which implementation of an operation is faster in general, then you cannot rely on a single measurement, or even a bunch of measurements of a single run of the code.

You are also mistaken about timeit calculating averages and standard deviations. You can do that yourself with the statistics module but you probably shouldn't.

GiggsB · Feb-14-2022, 04:37 AM

Hi,

Just want to give an update that I was able to solve the problem. I rechecked the .csv file and realized that I was receiving all the data but the 513th value and the following multiples of 513th value were not entering in a newline, instead in the same row. For ex. the 513th value and 514th value, were in the same row and the 515th in next row.
I solved this problem by simply adding print("\n") before the f.write('\n'.join([str(x) for x in values])).

Thank you for all those who helped Dance

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	File Handling not working properly	TheLummen	8	752	Feb-17-2024, 07:47 PM Last Post: TheLummen
	file handling Newbee question	middlecope	2	784	Jan-18-2023, 03:09 PM Last Post: middlecope
	python exception handling handling .... with traceback	mg24	3	1,284	Nov-09-2022, 07:29 PM Last Post: Gribouillis
	Delimiter issue with a CSV file	jehoshua	1	1,301	Apr-19-2022, 01:28 AM Last Post: jehoshua
	File handling issue	GiggsB	4	1,448	Mar-31-2022, 09:35 PM Last Post: GiggsB
	How to solve this file handling issue?	GiggsB	3	1,704	Jan-10-2022, 09:36 AM Last Post: Gribouillis
	File handling	knollfinder	3	2,058	Jun-28-2020, 07:39 PM Last Post: knollfinder
	Writing to File Issue	Flash_Stang	3	2,533	Jun-05-2020, 05:14 AM Last Post: Gribouillis
	file handling	sivareddy	1	1,643	Feb-23-2020, 07:28 PM Last Post: jefsummers
	Simple Read File Issue	blackjesus24	4	2,777	Feb-09-2020, 12:07 AM Last Post: blackjesus24

How can I solve this file handling issue?

User Panel Messages

Announcements