Python Forum

Full Version: Regular Expression
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am learning Python3 and Regular Expressions. I'm self-learning as this is for work. Much of my day is consumed with data and I can no longer work in Excel as my files can be very large (100MB or larger). I would like if someone could recommend a python 3 regular expression which will do three things. (1) Delete all characters between < and >, to include the, (comma) following the >. (2) Delete the word Done and all the (,) which follow Done. (3) Delete the word Skipped and all the (,) commas which follow it as well.

Change FROM:
<UUT><H s='19' v='3.0'/>,<V t='s' s='2'/>Profile,Production,<V t='s' s='2'/>Cycle,Normal,<V t='s' s='2'/>PMVer,14.0.1.103,<V t='s' s='2'/>SeqFileVer,2.0.0.1,<V t='s' s='2'/>User,0000010011688,<V t='s' s='2'/>Station,TS-0421A,<V t='s' s='2'/>Socket,0,<V t='s' s='2'/>Date,09-27-2018,<V t='s' s='2'/>Time,06:52:00,<V t='n' s='2'/>CycleTime,1100.366,<V t='s' s='2'/>Status,Passed,<V t='s' s='2'/>WorkOrder,70524831,<V t='s' s='2'/>MRPConfigurationString,R3A1200SS132113100000000000000,<V t='s' s='2'/>BedModelNumber,P7900B000011,<V t='s' s='2'/>BedSerialNumber,01008877619851621118092621T269PF9594,<V t='s' s='2'/>TestControl_TestType,Production,<V t='s' s='2'/>TestControl_TestCycle,Initial,<V t='s' s='2'/>TestControl_RepairCode,,<V t='s' s='2'/>TestControl_RepairStr,,<R s='517'/>,<S t='a' s='3'/>SetRTEConfig,Done,,<S t='a' s='3'/>LogTestertName,Passed,,<S t='s' c='IgnoreCase' s='5'/>{LogTestertName}LogTesterName,"TS-0421A",Passed,"TS-0421A",,<S t='a' s='3'/>SetVoltage0,Done,,<S t='a' s='3'/>OutputOff,Done,,<S t='a' s='3'/>UsbExtensionStartUp,Done,,<S t='a' s='3'/>StoreScannedInforForReport,Passed,,<S t='a' s='3'/>{StoreScannedInforForReport}Record Bed Order Number,Done,70524831,<S t='a' s='3'/>{StoreScannedInforForReport}Record Bed Configuration String,Done,R3A1200SS132113100000000000000,<S t='a' s='3'/>{StoreScannedInforForReport}Record Bed Model Number,Done,P7900B000011,<S t='a' s='3'/>{StoreScannedInforForReport}Record Bed Serial Number,Done,01008877619851621118092621T269PF9594,<S t='a' s='3'/>{StoreScannedInforForReport}SetBedConfigAllInfo,Done,,<S t='a' s='3'/>InitializeSystemVariables,Done,,<S t='a' s='3'/>VerifyValidBedConfigurationString,Passed,,<S t='a' s='3'/>{VerifyValidBedConfigurationString}SetResult,Done,,<S t='n' c='EQ' s='7'/>{VerifyValidBedConfigurationString}ConfigLength,30,Passed,30,,,,<S t='a' s='3'/>{VerifyValidBedConfigurationString}InsertDefaultNodes,Done,,<S t='a' s='3'/>{VerifyValidBedConfigurationString}InsertDCB,Skipped,,<S t='a' s='3'/>{VerifyValidBedConfigurationString}InsertACB,Done,,

Change To:
Profile,Production,Cycle,Normal,PMVer,14.0.1.103,SeqFileVer,2.0.0.1,User,10011688,Station,TS-0421A,Socket,0,Date,9/27/2018,Time,6:52:00,CycleTime,1100.366,Status,Passed,WorkOrder,70524831,MRPConfigurationString,R3A1200SS132113100000000000000,BedModelNumber,P7900B000011,BedSerialNumber,01008877619851621118092621T269PF9594,TestControl_TestType,Production,TestControl_TestCycle,Initial,TestControl_RepairCode,,TestControl_RepairStr,,,SetRTEConfig,LogTestertName,Passed,,{LogTestertName}LogTesterName,TS-0421A,Passed,TS-0421A,,SetVoltage0,OutputOff,UsbExtensionStartUp,StoreScannedInforForReport,Passed,,{StoreScannedInforForReport}Record Bed Order Number,70524831,{StoreScannedInforForReport}Record Bed Configuration String,R3A1200SS132113100000000000000,{StoreScannedInforForReport}Record Bed Model Number,P7900B000011,{StoreScannedInforForReport}Record Bed Serial Number,01008877619851621118092621T269PF9594,{StoreScannedInforForReport}SetBedConfigAllInfo,InitializeSystemVariables,VerifyValidBedConfigurationString,Passed,,{VerifyValidBedConfigurationString}SetResult,{VerifyValidBedConfigurationString}ConfigLength,30,Passed,30,{VerifyValidBedConfigurationString}InsertDefaultNodes,{VerifyValidBedConfigurationString}InsertDCB,{VerifyValidBedConfigurationString}InsertACB,

Tony
What have you tried so far?
We won't just write code, we're here to help people learn Python. So show us what you've got, and we'll help along the way.

For trying out regexes and seeing what works, I recommend https://www.regexpal.com/. You can dump some sample text, and get immediate feedback on what your regex would match.
Sorry to violate your rules. Thank you for the URL where I can experiment. I'll be back. Thanks.
You didn't violate anything :)
We'd just rather help you learn to fish, than hand you a basket of fish.
Looks like you just want to strip some XML tags.
Maybe an XML parser such as lxml would be a better fit?