Python Forum
How can I solve this file handling issue?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How can I solve this file handling issue?
#11
(Feb-12-2022, 08:26 AM)stevendaprano Wrote: ....but I am 95% sure that struct.unpack will be faster.
Stevendaprano,

Actually, Yes that IS the case. With your modified code, I ran the tests and the code with bit shift still takes ~20 sec., however, with struct.unpack takes ~6.3 seconds. Not sure what mistake I did in my test to compare both strategies but you were right. Thank you!

However, there is some data loss. The file should contain data in 200,704 rows, but I receive 200,313 rows only (It is missing 391 values). This is not the issue with other strategy. Any idea why that might be the case? Added the code, just in case:
# Untested.
import datetime
import os
import struct
 
import pigpio
import spidev
 
# We only have SPI bus 0 available to us on the Pi
bus = 0
#Device is the chip select pin. Set to 0 or 1, depending on the connections
device = 0
# Enable SPI
spi = spidev.SpiDev()
# Open a connection to a specific bus and device (chip select pin)
spi.open(bus, device)
# Set SPI speed and mode
spi.max_speed_hz = 4000000
spi.mode = 0
 
pi = pigpio.pi()
pi.set_mode(25, pigpio.INPUT)
 
def output_file_path():
    return os.path.join(os.path.dirname(__file__),
               datetime.datetime.now().strftime("%dT%H.%M.%S") + ".csv")
 
input("Press Enter to start the process ")
print("SM1 Process started...")
spi.xfer2([0x01])
while True:
    if pi.wait_for_edge(25, pigpio.RISING_EDGE, 5.0):
        print("Detected")
        data = [0]*2048
         
        with open(output_file_path(), 'w') as f:
            t1=datetime.datetime.now()
            for x in range(392):
                #t3 = datetime.datetime.now()
                spi.xfer2(data)
                values = struct.unpack(">" +"I"*512, bytes(data))
                f.write('\n'.join([str(x) for x in values]))
            t2=datetime.datetime.now()
            print(t2-t1)
        break
Reply
#12
(Feb-12-2022, 08:51 AM)GiggsB Wrote: bit shifting operation was faster than struct.unpack()

Showing us the results without showing us the code that generated the results is pointless.

Let's do this the right way, using the timeit module

import random
from timeit import Timer
data = [random.randint(0, 255) for i in range(2048)]

t1 = Timer('x = struct.unpack(">" +"I"*512, bytes(data))', setup="import struct; from __main__ import data")
t2 = Timer('''
x = []
for y in range(0, 2048, 4):
    x.append(data[y]<<24 | data[y+1]<<16 | data[y+2]<<8 | data[y+3])
''', setup="from __main__ import data")

print("struct.unpack", min(t1.repeat(number=10000, repeat=7)))
print("manual loop with bitshift", min(t2.repeat(number=10000, repeat=7)))
On my computer, the result is not even close: struct.unpack is about seven times faster than the manual loop. On a Raspberry Pi, your results might be different -- but I would be shocked if the manual loop was faster. The unpack version does most of the work at the speed of C, while the manual loop is doing everything in Python.

In any case, its not really important. Even the slow manual loop version should be fast enough, less than a millisecond. If your code is taking 20 seconds to collect data from the Pi and write it to a file, the bottleneck making it slow is not the part where you convert the list of 8-bit ints to 32-bit ints.
GiggsB and Gribouillis like this post
Reply
#13
(Feb-12-2022, 09:43 AM)GiggsB Wrote: The file should contain data in 200,704 rows, but I receive 200,313 rows only (It is missing 391 values).

I've reached the end of help I can give, without being able to read the data from a Raspberry Pi. Sorry.

But reading the code, I cannot see how it is possible to have lost data.
Reply
#14
(Feb-12-2022, 10:07 AM)stevendaprano Wrote: Let's do this the right way, using the timeit module

Thank you, I used your code and the result is that struct.unpack() is 14 times faster than manual loop.
   
I had to change the "Repeat =7" to "Repeat=1" for this test since manual loop was taking too long to generate output. By the way, I was using this code to do the comparision:
import time
import struct
import datetime as datetime

data=[0]*4
data[0]=1
data[1]=2
data[2]=3
data[3]=4

print("Using bit shifting:")
t1=datetime.datetime.now()
value1=data[0]<<24 | data[1]<<16 | data[2]<<8 | data[3]
t2=datetime.datetime.now()
print(t2-t1)

print("Using struct.unpack:")
t3=datetime.datetime.now()
value2=struct.unpack("<I", bytearray(data))[0]
t4=datetime.datetime.now()
print(t4-t3)
And also, thank you so much for helping me all along. I will try to figure out the data loss problem myself and post the solution once I figure it out.
Reply
#15
Unpacking data with minimal effort:

from struct import Struct
from random import randint


# little endian, 512 unsigned integers (4 bytes per int; 32 bit)
DataFormat = Struct("<512I")  # no fancy multiplication of "I"


# simulated incoming random data
random_data = bytes(randint(0, 255) for _ in range(2048))

# decoded data
decoded = DataFormat.unpack(random_data)
You should read the format specification of struct:
The bit shifting is nice to learn more about low-level stuff, but later you want to use more high-level stuff where you can make lesser errors. Premature optimization is the root of all evil. If you think your code runs too slow, measure it. Then find the spot where your code runs slow. Then you can optimize. In the most cases of getting input from something, the speed of Python is fast enough to process it with high-level constructs like the Struct class. Calling the functions directly, without the class instance, does not show a measurable difference in Speed with Python 3.10 on my desktop. Older Python Versions may have more significant differences of calling functions directly or methods of instances.


The measurement of time could be done with time.perf_counter(), because this timer has a higher resolution and there is a guarantee, that this timer will never go backwards. With a context manager, it's like magic:

from contextlib import contextmanager


@contextmanager
def measure_time(results: list):
    """
    This context manager measures the time in the with-block
    and adds the time to the given list `results`
    """
    start = time.perf_counter()
    yield
    stop = time.perf_counter()
    results.append(stop - start)


# code which uses measure_time

delays = []

for _ in range(10):
    with measure_time(delays):
        time.sleep(1)

print(delays)
The benefit is, that the timing logic is separated from user code. This implementation is the simplest way.
Not to use the datetime module, saves the conversion back and forth.

A class could also attach the delays to the instance, but this requires the understanding of OOP and the Python object model.

import time


class MeasureTime:
    def __init__(self):
        self._delays = []
        self._last_start = 0.0

    def __enter__(self):
        """
        This context manager measures the time in the with-block
        and adds the time to delays
        """
        self._last_start = time.perf_counter()

    def __exit__(self, exception_type, exception_value, exception_traceback):
        self._delays.append(time.perf_counter() - self._last_start)

    @property
    def delays(self):
        """
        Write protected delays of last usages as a tuple.
        """
        return tuple(self._delays)

    def clear(self):
        """
        Delete all measured delays
        """
        self._delays.clear()


# later in code

measure_time = MeasureTime()

with measure_time:
    time.sleep(1)

with measure_time:
    time.sleep(0.001) # 1 ms

print(measure_time.delays)
print("Clearing delays")
measure_time.clear()

print("One single measurement")
with measure_time:
    time.sleep(0.005)

print(measure_time.delays)
PS: Please do not compare with timeit because this module repeats functions x times to get an average time and the standard deviation. The code I posted is used to measure how long something takes at individual points in the code.
GiggsB likes this post
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#16
Hi DeaD_EyE,

Thanks for the detailed explanation about the time.perf_counter(). I also tried the
(Feb-12-2022, 12:48 PM)DeaD_EyE Wrote: Unpacking data with minimal effort:
way.
I had to change decoded = DataFormat.unpack(random_data) to decoded = DataFormat.unpack(bytes(random_data)). Please check my code below. This strategy also took ~6 seconds.
However, the problem of data loss also remained. 391 values were still missing.

import datetime
import os
from struct import  Struct
 
import pigpio
import spidev
 
# We only have SPI bus 0 available to us on the Pi
bus = 0
#Device is the chip select pin. Set to 0 or 1, depending on the connections
device = 0
# Enable SPI
spi = spidev.SpiDev()
# Open a connection to a specific bus and device (chip select pin)
spi.open(bus, device)
# Set SPI speed and mode
spi.max_speed_hz = 4000000
spi.mode = 0
 
pi = pigpio.pi()
pi.set_mode(25, pigpio.INPUT)
 
def output_file_path():
    return os.path.join(os.path.dirname(__file__),
               datetime.datetime.now().strftime("%dT%H.%M.%S") + ".csv")
 
input("Press Enter to start the process ")
print("SM1 Process started...")
spi.xfer2([0x01])

DataFormat = Struct(">512I")

while True:
    if pi.wait_for_edge(25, pigpio.RISING_EDGE, 5.0):
        print("Detected")
        data = [0]*2048
         
        with open(output_file_path(), 'w') as f:
            t1=datetime.datetime.now()
            for x in range(392):
                spi.xfer2(data)
                values=DataFormat.unpack(bytes(data))
                f.write('\n'.join([str(x) for x in values]))
            t2=datetime.datetime.now()
            print(t2-t1)
        break
Thanks.
Reply
#17
(Feb-12-2022, 12:48 PM)DeaD_EyE Wrote: Unpacking data with minimal effort:
DataFormat = Struct("<512I") # no fancy multiplication of "I"

Ah good, I thought there had to be another way of doing that, thanks!

Quote:Premature optimization is the root of all evil.

Seconded!

Quote:The measurement of time could be done with time.perf_counter(), because this timer has a higher resolution and there is a guarantee, that this timer will never go backwards.
...
PS: Please do not compare with timeit because this module repeats functions x times to get an average time and the standard deviation. The code I posted is used to measure how long something takes at individual points in the code.

I disagree strongly with this, especially for short snippets of code that run very quickly.

Taking a single measurement of a very brief operation is one of the "common traps for measuring execution times" that the documentation talks about. See here, where Tim Peters (one of the Python demigods) details some of the problems with timing small code snippets.

That was written a long time ago, and fortunately the question of time versus clock has been solved with perf_counter, but the other issues are still relevant. If anything, things are even worse: your OS and computer are probably even busier now than they were in 2002 when Tim wrote his introduction, and the chances of running your code when the computer and CPU is quiet even less. The results are probably more variability today than ever before.

If you want to know how long an operation actually took this time you ran it, then your decorator is a great solution. A decade ago I wrote this context manager that does something very similar.

But if you want to know which implementation of an operation is faster in general, then you cannot rely on a single measurement, or even a bunch of measurements of a single run of the code.

You are also mistaken about timeit calculating averages and standard deviations. You can do that yourself with the statistics module but you probably shouldn't.
GiggsB likes this post
Reply
#18
Hi,

Just want to give an update that I was able to solve the problem. I rechecked the .csv file and realized that I was receiving all the data but the 513th value and the following multiples of 513th value were not entering in a newline, instead in the same row. For ex. the 513th value and 514th value, were in the same row and the 515th in next row.
I solved this problem by simply adding print("\n") before the f.write('\n'.join([str(x) for x in values])).

Thank you for all those who helped Dance Big Grin
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  File Handling not working properly TheLummen 8 752 Feb-17-2024, 07:47 PM
Last Post: TheLummen
  file handling Newbee question middlecope 2 784 Jan-18-2023, 03:09 PM
Last Post: middlecope
Star python exception handling handling .... with traceback mg24 3 1,284 Nov-09-2022, 07:29 PM
Last Post: Gribouillis
  Delimiter issue with a CSV file jehoshua 1 1,301 Apr-19-2022, 01:28 AM
Last Post: jehoshua
  File handling issue GiggsB 4 1,448 Mar-31-2022, 09:35 PM
Last Post: GiggsB
  How to solve this file handling issue? GiggsB 3 1,704 Jan-10-2022, 09:36 AM
Last Post: Gribouillis
  File handling knollfinder 3 2,058 Jun-28-2020, 07:39 PM
Last Post: knollfinder
  Writing to File Issue Flash_Stang 3 2,533 Jun-05-2020, 05:14 AM
Last Post: Gribouillis
  file handling sivareddy 1 1,643 Feb-23-2020, 07:28 PM
Last Post: jefsummers
  Simple Read File Issue blackjesus24 4 2,777 Feb-09-2020, 12:07 AM
Last Post: blackjesus24

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020