Python Forum
Numpy Structure and Efficiency
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Numpy Structure and Efficiency
#3
I played with your code, not excatly want you are expecting but here you may find some tricks to help you.

If you can estimate the size of the initial array, then you can drastically speed-up the code without appending or concatenating



import numpy as np
import re, time

# Way 1: if you cannot estimate the size of the data array
def GetData(Data, Line):
    Line = Line.replace('D', 'E')
    Variables = re.split(r"\s+", Line)
    # time      = Variables[3]
    # delta_t   = Variables[4]
    # mass      = Variables[5]
    # radius    = Variables[6]
    # lum_core  = Variables[7]
    # lum_tot   = Variables[8]
    # flux      = Variables[9]
    # ratio     = Variables[10]
    
    Array = np.empty(8)
    Array=[Variables[i] for i in range(3, 11)]
    Data = np.vstack((Data, Array))
    return Data

# a single line is used here instead of a complete text file
Extract = "hydro output:     1    1.05200D+09    1.05200D+09    9.94376D+31    3.66754D+10    7.52265D+31    7.52265D+31    4.99722D-235   0.0499938"


t0=time.time()
# Data array is initialized
Data=np.empty(8, dtype=float)

# A n lines text file is simulated using a loop
n=10_000
for i in range(n):
    if "HYDRO" in Extract.upper(): Data=GetData(Data, Extract)
    
t1=time.time()    
# now the first first empty line is removed
Data=np.delete(Data, 0, axis=0)

# the array is composed of string so far, it's converted into float in a single step (faster than converting numbers one by one)
Data=Data.astype(float)

# remember :
# time      = column 0
# delta_t   = column 1
# mass      = column 2
# radius    = column 3
# lum_core  = column 4
# lum_tot   = column 5
# flux      = column 6
# ratio     = column 7

# if you want all radius data for example:
Radius=Data[:, 3]
t2=time.time()
print(f"Duration reading lines={t1-t0}")
print(f"Duration converting data={t2-t1}")



## way 2: if you can estimate the size of the data array (can be the max number of lines?)
n=10_000
Data2 = np.empty((n,8))
for i in range(n):
    if "HYDRO" in Extract.upper():
        Extract = Extract.replace('D', 'E')
        Variables = re.split(r"\s+", Extract)
        Data2[i, :]=[Variables[j] for j in range(3, 11)]

Data2=Data2.astype(float)

flux=Data[:, 6]
t3=time.time() 
print(f"Duration way2={t3-t2}")

MaxDifference=np.max(np.absolute(Data-Data2))
print(f"Max difference={MaxDifference}")
Reply


Messages In This Thread
Numpy Structure and Efficiency - by garynewport - Oct-19-2022, 11:34 AM
RE: Numpy Structure and Efficiency - by garynewport - Oct-19-2022, 12:00 PM
RE: Numpy Structure and Efficiency - by paul18fr - Oct-19-2022, 10:11 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Efficiency with regard to nested conditionals or and statements Mark17 13 3,270 May-06-2022, 05:16 PM
Last Post: Mark17
  How to use vectorization instead of for loop to improve efficiency in python? PJLEMZ 4 2,469 Feb-06-2021, 09:45 AM
Last Post: paul18fr
  Any suggestions to improve BuySell stock problem efficiency? mrapple2020 0 1,392 May-13-2020, 06:19 PM
Last Post: mrapple2020
  Help improve code efficiency benbrown03 9 4,427 Feb-20-2019, 03:45 PM
Last Post: ichabod801
  Web Scraping efficiency improvement HiImNew 0 2,417 Jun-01-2018, 08:52 PM
Last Post: HiImNew
  Improving Efficiency of SVM by various available kernels Sachtech 0 2,122 Apr-09-2018, 07:29 AM
Last Post: Sachtech
  Still learning - code efficiency, which of these is better? shelzmike 2 3,319 Oct-14-2017, 04:47 AM
Last Post: shelzmike

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020