Python Forum
Numpy Structure and Efficiency
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Numpy Structure and Efficiency
#3
I played with your code, not excatly want you are expecting but here you may find some tricks to help you.

If you can estimate the size of the initial array, then you can drastically speed-up the code without appending or concatenating



import numpy as np
import re, time

# Way 1: if you cannot estimate the size of the data array
def GetData(Data, Line):
    Line = Line.replace('D', 'E')
    Variables = re.split(r"\s+", Line)
    # time      = Variables[3]
    # delta_t   = Variables[4]
    # mass      = Variables[5]
    # radius    = Variables[6]
    # lum_core  = Variables[7]
    # lum_tot   = Variables[8]
    # flux      = Variables[9]
    # ratio     = Variables[10]
    
    Array = np.empty(8)
    Array=[Variables[i] for i in range(3, 11)]
    Data = np.vstack((Data, Array))
    return Data

# a single line is used here instead of a complete text file
Extract = "hydro output:     1    1.05200D+09    1.05200D+09    9.94376D+31    3.66754D+10    7.52265D+31    7.52265D+31    4.99722D-235   0.0499938"


t0=time.time()
# Data array is initialized
Data=np.empty(8, dtype=float)

# A n lines text file is simulated using a loop
n=10_000
for i in range(n):
    if "HYDRO" in Extract.upper(): Data=GetData(Data, Extract)
    
t1=time.time()    
# now the first first empty line is removed
Data=np.delete(Data, 0, axis=0)

# the array is composed of string so far, it's converted into float in a single step (faster than converting numbers one by one)
Data=Data.astype(float)

# remember :
# time      = column 0
# delta_t   = column 1
# mass      = column 2
# radius    = column 3
# lum_core  = column 4
# lum_tot   = column 5
# flux      = column 6
# ratio     = column 7

# if you want all radius data for example:
Radius=Data[:, 3]
t2=time.time()
print(f"Duration reading lines={t1-t0}")
print(f"Duration converting data={t2-t1}")



## way 2: if you can estimate the size of the data array (can be the max number of lines?)
n=10_000
Data2 = np.empty((n,8))
for i in range(n):
    if "HYDRO" in Extract.upper():
        Extract = Extract.replace('D', 'E')
        Variables = re.split(r"\s+", Extract)
        Data2[i, :]=[Variables[j] for j in range(3, 11)]

Data2=Data2.astype(float)

flux=Data[:, 6]
t3=time.time() 
print(f"Duration way2={t3-t2}")

MaxDifference=np.max(np.absolute(Data-Data2))
print(f"Max difference={MaxDifference}")
Reply


Messages In This Thread
Numpy Structure and Efficiency - by garynewport - Oct-19-2022, 11:34 AM
RE: Numpy Structure and Efficiency - by garynewport - Oct-19-2022, 12:00 PM
RE: Numpy Structure and Efficiency - by paul18fr - Oct-19-2022, 10:11 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Efficiency with regard to nested conditionals or and statements Mark17 13 3,378 May-06-2022, 05:16 PM
Last Post: Mark17
  How to use vectorization instead of for loop to improve efficiency in python? PJLEMZ 4 2,519 Feb-06-2021, 09:45 AM
Last Post: paul18fr
  Any suggestions to improve BuySell stock problem efficiency? mrapple2020 0 1,413 May-13-2020, 06:19 PM
Last Post: mrapple2020
  Help improve code efficiency benbrown03 9 4,473 Feb-20-2019, 03:45 PM
Last Post: ichabod801
  Web Scraping efficiency improvement HiImNew 0 2,439 Jun-01-2018, 08:52 PM
Last Post: HiImNew
  Improving Efficiency of SVM by various available kernels Sachtech 0 2,152 Apr-09-2018, 07:29 AM
Last Post: Sachtech
  Still learning - code efficiency, which of these is better? shelzmike 2 3,360 Oct-14-2017, 04:47 AM
Last Post: shelzmike

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020