Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regular Expression
#1
I am learning Python3 and Regular Expressions. I'm self-learning as this is for work. Much of my day is consumed with data and I can no longer work in Excel as my files can be very large (100MB or larger). I would like if someone could recommend a python 3 regular expression which will do three things. (1) Delete all characters between < and >, to include the, (comma) following the >. (2) Delete the word Done and all the (,) which follow Done. (3) Delete the word Skipped and all the (,) commas which follow it as well.

Change FROM:
<UUT><H s='19' v='3.0'/>,<V t='s' s='2'/>Profile,Production,<V t='s' s='2'/>Cycle,Normal,<V t='s' s='2'/>PMVer,14.0.1.103,<V t='s' s='2'/>SeqFileVer,2.0.0.1,<V t='s' s='2'/>User,0000010011688,<V t='s' s='2'/>Station,TS-0421A,<V t='s' s='2'/>Socket,0,<V t='s' s='2'/>Date,09-27-2018,<V t='s' s='2'/>Time,06:52:00,<V t='n' s='2'/>CycleTime,1100.366,<V t='s' s='2'/>Status,Passed,<V t='s' s='2'/>WorkOrder,70524831,<V t='s' s='2'/>MRPConfigurationString,R3A1200SS132113100000000000000,<V t='s' s='2'/>BedModelNumber,P7900B000011,<V t='s' s='2'/>BedSerialNumber,01008877619851621118092621T269PF9594,<V t='s' s='2'/>TestControl_TestType,Production,<V t='s' s='2'/>TestControl_TestCycle,Initial,<V t='s' s='2'/>TestControl_RepairCode,,<V t='s' s='2'/>TestControl_RepairStr,,<R s='517'/>,<S t='a' s='3'/>SetRTEConfig,Done,,<S t='a' s='3'/>LogTestertName,Passed,,<S t='s' c='IgnoreCase' s='5'/>{LogTestertName}LogTesterName,"TS-0421A",Passed,"TS-0421A",,<S t='a' s='3'/>SetVoltage0,Done,,<S t='a' s='3'/>OutputOff,Done,,<S t='a' s='3'/>UsbExtensionStartUp,Done,,<S t='a' s='3'/>StoreScannedInforForReport,Passed,,<S t='a' s='3'/>{StoreScannedInforForReport}Record Bed Order Number,Done,70524831,<S t='a' s='3'/>{StoreScannedInforForReport}Record Bed Configuration String,Done,R3A1200SS132113100000000000000,<S t='a' s='3'/>{StoreScannedInforForReport}Record Bed Model Number,Done,P7900B000011,<S t='a' s='3'/>{StoreScannedInforForReport}Record Bed Serial Number,Done,01008877619851621118092621T269PF9594,<S t='a' s='3'/>{StoreScannedInforForReport}SetBedConfigAllInfo,Done,,<S t='a' s='3'/>InitializeSystemVariables,Done,,<S t='a' s='3'/>VerifyValidBedConfigurationString,Passed,,<S t='a' s='3'/>{VerifyValidBedConfigurationString}SetResult,Done,,<S t='n' c='EQ' s='7'/>{VerifyValidBedConfigurationString}ConfigLength,30,Passed,30,,,,<S t='a' s='3'/>{VerifyValidBedConfigurationString}InsertDefaultNodes,Done,,<S t='a' s='3'/>{VerifyValidBedConfigurationString}InsertDCB,Skipped,,<S t='a' s='3'/>{VerifyValidBedConfigurationString}InsertACB,Done,,

Change To:
Profile,Production,Cycle,Normal,PMVer,14.0.1.103,SeqFileVer,2.0.0.1,User,10011688,Station,TS-0421A,Socket,0,Date,9/27/2018,Time,6:52:00,CycleTime,1100.366,Status,Passed,WorkOrder,70524831,MRPConfigurationString,R3A1200SS132113100000000000000,BedModelNumber,P7900B000011,BedSerialNumber,01008877619851621118092621T269PF9594,TestControl_TestType,Production,TestControl_TestCycle,Initial,TestControl_RepairCode,,TestControl_RepairStr,,,SetRTEConfig,LogTestertName,Passed,,{LogTestertName}LogTesterName,TS-0421A,Passed,TS-0421A,,SetVoltage0,OutputOff,UsbExtensionStartUp,StoreScannedInforForReport,Passed,,{StoreScannedInforForReport}Record Bed Order Number,70524831,{StoreScannedInforForReport}Record Bed Configuration String,R3A1200SS132113100000000000000,{StoreScannedInforForReport}Record Bed Model Number,P7900B000011,{StoreScannedInforForReport}Record Bed Serial Number,01008877619851621118092621T269PF9594,{StoreScannedInforForReport}SetBedConfigAllInfo,InitializeSystemVariables,VerifyValidBedConfigurationString,Passed,,{VerifyValidBedConfigurationString}SetResult,{VerifyValidBedConfigurationString}ConfigLength,30,Passed,30,{VerifyValidBedConfigurationString}InsertDefaultNodes,{VerifyValidBedConfigurationString}InsertDCB,{VerifyValidBedConfigurationString}InsertACB,

Tony
Reply
#2
What have you tried so far?
We won't just write code, we're here to help people learn Python. So show us what you've got, and we'll help along the way.

For trying out regexes and seeing what works, I recommend https://www.regexpal.com/. You can dump some sample text, and get immediate feedback on what your regex would match.
Reply
#3
Sorry to violate your rules. Thank you for the URL where I can experiment. I'll be back. Thanks.
Reply
#4
You didn't violate anything :)
We'd just rather help you learn to fish, than hand you a basket of fish.
Reply
#5
Looks like you just want to strip some XML tags.
Maybe an XML parser such as lxml would be a better fit?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  data validation with specific regular expression shaheen07 0 296 Jan-12-2024, 07:56 AM
Last Post: shaheen07
  Regular Expression search to comment lines of code Gman2233 5 1,590 Sep-08-2022, 06:57 AM
Last Post: ndc85430
  List Creation and Position of Continue Statement In Regular Expression Code new_coder_231013 3 1,601 Jun-15-2022, 12:00 PM
Last Post: new_coder_231013
  Need help with my code (regular expression) shailc 5 1,871 Apr-04-2022, 07:34 PM
Last Post: shailc
  Regular Expression for matching words xinyulon 1 2,132 Mar-09-2022, 10:34 PM
Last Post: snippsat
  regular expression question Skaperen 4 2,417 Aug-23-2021, 06:01 PM
Last Post: Skaperen
  How can I find all combinations with a regular expression? AlekseyPython 0 1,636 Jun-23-2021, 04:48 PM
Last Post: AlekseyPython
  Python Regular expression, small sample works but not on file Acernz 5 2,858 Jun-09-2021, 08:27 PM
Last Post: bowlofred
  Regular expression: cannot find 1st number in a string Pavel_47 2 2,364 Jan-15-2021, 04:39 PM
Last Post: bowlofred
  Regular expression: return string, not list Pavel_47 3 2,451 Jan-14-2021, 11:49 AM
Last Post: Pavel_47

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020