Python Forum
Regex text file to store data in list - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Regex text file to store data in list (/thread-31301.html)



Regex text file to store data in list - TheSithSiggi - Dec-03-2020

Hi all!

I have the following sample text which are Finite Element Analysis results:

The first column are X,Y and z normal stress, XY, YZ and ZX are shear stress and A,B and C are principal stress.

I would like to use Regex to search a file for this pattern (the file is unstructred with only parts of the file on this format, to my knowlege regex is the best option when the overall data is unstructured)

text = '''
X-1.30779E+01    XY-1.26471E+01      A 3.00940E+01
Y 2.63890E+01    YZ 7.83649E-04      B 5.98331E-01      
Z 5.96212E-01    ZX 2.04834E-01      C-1.67851E+01    
 
X-1.53833E+01    XY-4.23500E+00      A 2.50320E+01    
Y 2.45882E+01    YZ-1.64653E-02      B-4.95968E-01
Z-4.96026E-01    ZX 3.55515E-02      C-1.58271E+01   
'''
I have written the following regex:

normalPattern = r'\s[XYZ].\d.\d{5}E.\d{2}\s'

shearPattern = r'(\s(XY|YZ|ZX).\d.\d{5}E.\d{2}\s)'

principalPattern = r'\s[ABC].\d.\d{5}E.\d{2}\s'

reg1 = re.findall(normalPattern,text)
 
reg2 = re.findall(shearPattern,text)

reg3 = re.findall(principalPattern,text)
Which produce the following results:

Output:
reg1 Out[173]: ['\nX-1.30779E+01 ', '\nY 2.63890E+01 ', '\nZ 5.96212E-01 ', '\nX-1.53833E+01 ', '\nY 2.45882E+01 ', '\nZ-4.96026E-01 '] reg2 Out[174]: [(' XY-1.26471E+01 ', 'XY'), (' YZ 7.83649E-04 ', 'YZ'), (' ZX 2.04834E-01 ', 'ZX'), (' XY-4.23500E+00 ', 'XY'), (' YZ-1.64653E-02 ', 'YZ'), (' ZX 3.55515E-02 ', 'ZX')] reg3 Out[175]: [' A 3.00940E+01\n', ' B 5.98331E-01 ', ' C-1.67851E+01 ', ' A 2.50320E+01 ', ' B-4.95968E-01\n', ' C-1.58271E+01 ']
My question:

1) for Reg1 and Reg3 I get the "\n" in some cases, how can I exclude them?

2) for Reg2 for some reason I get an extra "XY", "YZ" and "ZX" how can I exlude them?

Thank you!

Regards
Siggi


RE: Regex text file to store data in list - bowlofred - Dec-03-2020

1 is because you've specified that whitespace should be the first bit of the captured item. Even if you did want this, you probably don't want the actual data. So you could either use capturing parentheses and exclude it, or use lookbehind.

But for here, I'd prefer specifying a word boundary. If the first string in the file was one of these but had no whitespace before it, the regex would fail. So maybe something like:

normalPattern = r'\b([XYZ])(.\d.\d{5}E.\d{2})\b'
The \b will match both whitespace and nothing at all being next to the element. The two sets of parentheses give you easy access to the identifier and the number string which can then be fed into float() to give you a number.

2 is not that you're getting extra stuff, you're getting a tuple back instead of a string. shearPattern has capturing parentheses. You have one set around the whole pattern (that's the first element), and you have one set just around the identifier at the front (that's the second element).