Python Forum

Full Version: Challenge with my string
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am working on a script and I get an attachment emailed then check for the mail and open the attachment. So far I open it as a text string object from the email and it looks like this:

Content-Type: application/octet-stream;
	name="stockreport.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="stockreport.txt"
=EF=BB=BF73558
Lufthansa Technik
LTCS - H29&H30 - OSP Fiber
Location	Part#	Description	Curr Qty=09
M3B6	RIC-F-SA12-01	Fiber Bulkhead-SM/MM-6 Duplex-12 Adapters-ST-Black	25
M3B8	RIC-F-LCU24-01C	Fiber Bulkhead-SM-12 Duplex-24 =
Adapters-LC-Black/Blue	1
It can have any arbitrary number of items that will follow the same pattern: Location, Part#, Description, and Curr Qty. My goal from the example would be to end up with a list like this:

['73558', 'Lufthansa Technik', 'LTCS - H29&H30 - OSP Fiber', 'Location', 'Part#','Description', 'Curr Qty', 'M3B6', 'RIC-F-SA12-01', 'Fiber Bulkhead-SM/MM-6 Duplex-12 Adapters-ST-Black', '25', 'M3B8', 'RIC-F-LCU24-01C', 'Fiber Bulkhead-SM-12 Duplex-24 = Adapters-LC-Black/Blue', '1']
I'm thinking I can do my next part of processing easily once I have the list like this. The plan being to feed the data into a template for printing.

I have tried all sorts of splits but I feel stuck because none of them are quite right.
M3B6    RIC-F-SA12-01   Fiber Bulkhead-SM/MM-6 Duplex-12 Adapters-ST-Black  25
M3B8    RIC-F-LCU24-01C Fiber Bulkhead-SM-12 Duplex-24 =
Adapters-LC-Black/Blue  1
Is there an error in the second line, should = be followed by Adapters-LC-Black/Blue 1?
The columns seem to be aligned. If this is true for every file you will parse, and the number of character "slots" allocated for a single column is constant, you could get the data you want by slicing and stripping the strings.
Edit.:
Ah that = ("potential error") might be there because otherwise the line is too long and columns arent aligned anymore in this case.
Ok I had to do some digging and there are a few things that get added in when I "grab" things from the email.

I assume this is a line length thing.

On line 9 the "=09" at the end of Qty is extra.

On line 11 the "= " as you pointed out is extra.

I'm going to try and pull some other emails to see what kind of other odd things happen.
It took me a while to figure out but I had to do a .decode() to get "clean" output.

I still have a challenge with the very first item has the encoding information from the document so it is always prefixed by '\ufeff'.