Python Forum
Parsing text list to csv using delimiter discarding non-interesting data
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Parsing text list to csv using delimiter discarding non-interesting data
#1
Great coders, I am having this problem and unable to solve from similar threads.

I have no idea how to do this - seems importing the csv module is a good starting point.
Problem is that the data is not "linear". 
The word(s) to the left of the colon is the field name (header) and to the right the data.
The input file is rather large.
I don't expect you to code this for me, but a little guidance to solve this problem would be greatly appreciated.

My python is rather old (2.6.8).

This is the output file that have to be parsed:

Output:
#################################### LOCATION: home   ----> This I need #################################### some non-interesting text to be discarded. ================================================================================ item table (details) ================================================================================ -------------------------------------------------------------------------------- item --------------------------------------------------------------------------------                  color : silver                     type : fridge              mandatory : yes                   substance : metal              viscosity : solid             winter use : always              chemicals : refrigerant          safe for kids : yes            accessories : cooler               cleaning : cloth ================================================================================ #################################### LOCATION: home   ----> This I need #################################### some non-interesting text to be discarded. ================================================================================ item table (details) ================================================================================ -------------------------------------------------------------------------------- item --------------------------------------------------------------------------------                  color : blue                     type : pool              mandatory : no                  substance : water              viscosity : liquid             winter use : never              chemicals : chlorine          safe for kids : depends            accessories : pump               cleaning : filter
I expect a simliar output of:
Output:
LOCATION, color, type, mandatory, substance, viscosity,winter use, chemicals, safe for kids, accessories, cleaning home, silver, fridge, yes, metal, solid, always, refrigerant, yes, cooler, cloth home, blue, pool, no, water, liquid, never, chlorine, depends, pump, filter
Reply
#2
One approach:
read the file in chunks of 20 lines (assuming each separate chunk has the same structure as the two chunks in your example).
you need lines 2 and 12-19 of each chunk.
parse the respective lines (using split, strip, etc.)
write result to your file.

Alternative:
read the file line by line
check if line (stripped of leading spaces) starts with one of the words (LOCATION, color, mandatory, viscosity,winter use, chemicals, safe for kids, accessories, cleaning)
parse the respective line accordingly. Note that you need to write a row to the output file when you find next LOCATION line.

Third approach  - RegEx
This one should be first one, if you are familiar with RegEx
regex that returns all matches from your example [^ \n][\w ]* ?: \w*. There might be a better one, but I'm not that experienced with RegEx

Maybe combine RegEx with reafing chunks of 20 lines/
Reply
#3
Thank you buran,

It took a while, but I finally got it sorted:

Open the file, loop with 'for' line by line.
Parse each line by using slicing [:] and the find command and pass results to a variable.
Finally string final variables together by means of concatenation with comma's between variables and strip all white spaces by replacing " " with "":

outFile = AoutFile.replace(" ", "")

I need to work on my regex skills now :-)
Reply
#4
show us your code. Maybe I can give you some advise. If I have time tonight I may write the regex version.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Text parsing Arik 5 309 Mar-11-2024, 03:30 PM
Last Post: Gribouillis
  Help with to check an Input list data with a data read from an external source sacharyya 3 318 Mar-09-2024, 12:33 PM
Last Post: Pedroski55
  python Read each xlsx file and write it into csv with pipe delimiter mg24 4 1,312 Nov-09-2023, 10:56 AM
Last Post: mg24
  Context-sensitive delimiter ZZTurn 9 1,394 May-16-2023, 07:31 AM
Last Post: Gribouillis
Video doing data treatment on a file import-parsing a variable EmBeck87 15 2,667 Apr-17-2023, 06:54 PM
Last Post: EmBeck87
  Read csv file with inconsistent delimiter gracenz 2 1,149 Mar-27-2023, 08:59 PM
Last Post: deanhystad
  json api data parsing elvis 0 902 Apr-21-2022, 11:59 PM
Last Post: elvis
  Delimiter issue with a CSV file jehoshua 1 1,219 Apr-19-2022, 01:28 AM
Last Post: jehoshua
  Modify values in XML file by data from text file (without parsing) Paqqno 2 1,577 Apr-13-2022, 06:02 AM
Last Post: Paqqno
  I need help parsing through data and creating a database using beautiful soup username369 1 1,688 Sep-22-2021, 08:45 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020