Python Forum

Hi,

Very new to python and need to write a script to analyse some data (example below)
I've got thousands of these "chunks" and I need to read out the chunks that have sequential pulse_ID numbers and the same train_ID numbers.

From a complete beginner who has been madly googling, my plan is to:
1. Combine all the lines in one chunk section so that then I can search and compare between lines
2. find the words "train_ID = " and "pulse_ID = " and read out the numbers next to these words and then compare with numbers from different lines
3. Separate the lines back to the way they were as I would then need to read it with a program that uses the original format.

*One detail is that the number of lines within a chunk change depending on how many peaks (line 139-141 - 3 peaks for this chunk) each chunk has!

Is this a good/efficient way to do my aim? Is there a simpler way I'm missing?

Any help is much appreciated,

Thanks from Brainstorm_for_a_dummy!

----- Begin chunk -----
128 Image filename: /home/h5_files_full/11112016_chip17_grid_2_scan_002_P000002_00001.cbf_0.h5
129 Image serial number: 2
130 indexed_by = none
131 photon_energy_eV = 12000.000000
132 beam_divergence = 0.00e+00 rad
133 beam_bandwidth = 1.00e-08 (fraction)
134 average_camera_length = 0.587800 m
135 train_ID = 12479848
136 pulse_ID = 142
137 Peaks from peak search
138 fs/px ss/px (1/d)/nm^-1 Intensity Panel
139 1380.87 565.17 1.90 43.14 0
140 1897.10 1733.77 2.20 33.21 0
141 1547.27 1832.08 1.82 25.95 0
142 End of peak list
143 ----- End chunk -----

I don't think it's worth your time to try to change what the stream looks like, just parse it line-by-line as if it was a file. This isn't perfect, but should be enough to get you going:

>>> stream = '''----- Begin chunk -----
... 128 Image filename: /home/h5_files_full/11112016_chip17_grid_2_scan_002_P000002_00001.cbf_0.h5
... 129 Image serial number: 2
... 130 indexed_by = none
... 131 photon_energy_eV = 12000.000000
... 132 beam_divergence = 0.00e+00 rad
... 133 beam_bandwidth = 1.00e-08 (fraction)
... 134 average_camera_length = 0.587800 m
... 135 train_ID = 12479848
... 136 pulse_ID = 142
... 137 Peaks from peak search
... 138 fs/px ss/px (1/d)/nm^-1 Intensity Panel
... 139 1380.87 565.17 1.90 43.14 0
... 140 1897.10 1733.77 2.20 33.21 0
... 141 1547.27 1832.08 1.82 25.95 0
... 142 End of peak list
... 143 ----- End chunk -----'''.split('\n')
>>> def parse_chunk(chunk):
...     train = None
...     pulse = None
...     for line in chunk:
...         if "train_ID" in line:
...             parts = line.split("train_ID")
...             train = parts[1].strip()
...         if "pulse_ID" in line:
...             parts = line.split("pulse_ID")
...             pulse = parts[1].strip()
...         if train and pulse:
...             return train, pulse
...
>>> parse_chunk(stream)
('= 12479848', '= 142')

Python_for_dummies_needed

nilamo