Python Forum
fastest way to record values between quotes - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: fastest way to record values between quotes (/thread-17518.html)



fastest way to record values between quotes - paul18fr - Apr-14-2019

Dear All,

I've not been using Python for years (I'm rather a matlab user), But I've the opportunity to work with it again. Blush

The goal is to deal with Ascii files, containing different types of data; I'm wondering how to improve the code here after in order to record specific numbers between quotes, especially can I avoid (hugly) loop use?

Thanks for any advice

Paul
t0 = time.time()
line1 = '"tel 555-669", "duration 6", "number 2.", "Price 3.58"'
quote = re.finditer('"',line1)
position_quote = []
for match_quote in quote:
    position_quote.append(match_quote.span())

Price = float(line1[position_quote[6][0]+7 : position_quote[7][0]-1])
t1 = time.time()
print("Duration : ", t1-t0)



RE: fastest way to record values between quotes - pythonduffer - Apr-14-2019

I ran the program as you posted it and got the same result.
Then I inserted a couple of lines to force a delay.

I inserted the following code immediately after line 6
while t0==time.time():
t0=t0
I also imported time an re.
The program ran and produced the following output Duration : 0.015587329864501953
Your computer will produce a different output because my computer is slower ? faster ? than yours.


import time
import re
t0 = time.time()
line1 = '"tel 555-669", "duration 6", "number 2.", "Price 3.58"'
quote = re.finditer('"',line1)
position_quote = []
for match_quote in quote:
    position_quote.append(match_quote.span())

    while t0==time.time():
        t0=t0
    
Price = float(line1[position_quote[6][0]+7 : position_quote[7][0]-1])
t1 = time.time()
print("Duration : ", t1-t0)

I didn't explain why I inserted the two extra lines of code.
They force a delay to see if the routine is running so fast that there isn't sufficient time for the time function to be updated. So the program just stalls until it is update. Hope this helps.


RE: fastest way to record values between quotes - snippsat - Apr-14-2019

Can't you match more directly?
>>> import re
>>> 
>>> line1 = '"tel 555-669", "duration 6", "number 2.", "Price 3.58"'
>>> r = re.search(r"(Price)\s(\d.\d+)", line1)
>>> d = r.groups()
>>> d
('Price', '3.58')
>>> d = dict([d])
>>> d
{'Price': '3.58'}
>>> d['Price']
'3.58'
>>> float(d['Price'])
3.58
Did also match Price so could make a dictionary.
pythonduffer Wrote:Then I inserted a couple of lines to force a delay.
Can use timeit that is made for measure execution of small code snippets.


RE: fastest way to record values between quotes - paul18fr - Apr-14-2019

@snippsat: thanks for the help; much more "elegant" and efficient than using the loop

Well I need to go deep in regex use


RE: fastest way to record values between quotes - paul18fr - Apr-15-2019

Hi

An additional question regarding regex use (some things still remain unclear at the moment for me); Ok for:
  • () to create a groupe
  • \s to point to a space character
  • \d+ pour read a number up to the end
  • \d. to allow the dot for a character

But the scheme is pre-defined, I mean the number of space characters is known: so how can I basically test the number of space characters and thus the number of groups?

line2 = '"555 102 4 21", "39 555 6", "555 102"';
reg7 = re.search(r"(\"555)\s(\d+)\s(\d+)\s(\d+)", line2);
d7 = reg7.groups()



RE: fastest way to record values between quotes - snippsat - Apr-15-2019

As that string has extra double quotes,can do tricks and match between quotes.
For testing regex look at regex101.
There also a Regular Expression HOWTO and the regular doc.
>>> import re
>>> 
>>> line2 = '"555 102 4 21", "39 555 6", "555 102"'
>>> r = re.findall(r'"(.*?)"', line2)
>>> r
['555 102 4 21', '39 555 6', '555 102']

>>> r[0]
'555 102 4 21'
>>> [int(i) for i in r[0].split()]
[555, 102, 4, 21]
>>> # Or think of why this work,hint new PEP
>>> g = r[0].replace(' ', '_')
>>> g
'555_102_4_21'
>>> int(g)
555102421

# No error using underscore in integer
>>> 100_000_000_123
100000000123