Python Forum
Clean file with missing values
Thread Rating:
  • 1 Vote(s) - 1 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Clean file with missing values
#1
Hi guys,

I need to clean a file with 13 millions lines. Some lines don't has value on the third column, so I need to exclude them.
Here's an input example:
Output:
-8.17641148055556 54.4510540761111 -178.44 -8.17642712805556 54.4511079527778 -178.29 -8.17644273722222 54.4511528463889 -8.176458385 54.4512067233333 -8.17648941 54.4512515941667
This is the desired output:
Output:
-8.17641148055556 54.4510540761111 -178.44 -8.17642712805556 54.4511079527778 -178.29
Here's my code, but it gives me a blank file:
#!/usr/bin/env python3

with open('cleancol.txt',"r") as fh, open('cleancol2.txt',"w") as sh:
    for row in fh:
        row_list = row.split(' ')
        if row_list[2].isnumeric():
            print(row, file=sh, end='')
Thanks for the help !!
Reply
#2
Hello,
I didn't test your code manually to see why you get a blank file. But the approach doesn't seem right. Because, what is row_list[2], when you only have 2 columns?
Instead I would rather use string split method, and check whether the returned list has 2 or 3 elements. Good luck!
Reply
#3
row_list[2] will have \n at the end, so isnumeric() will return False. Strange that you don't get IndexError for rows with 2 numbers...
Reply
#4
That missing IndexError is exactly what bugs me. Is it possible that file isn't opened properly, or even not the right file?
Reply
#5
(Jan-23-2017, 03:23 PM)j.crater Wrote: That missing IndexError is exactly what bugs me. Is it possible that file isn't opened properly, or even not the right file?

Could it be that there are several spaces at the end, i.e. third item then I think would be '' (empty str)?
Reply
#6
(Jan-23-2017, 03:31 PM)buran Wrote:
(Jan-23-2017, 03:23 PM)j.crater Wrote: That missing IndexError is exactly what bugs me. Is it possible that file isn't opened properly, or even not the right file?

Could it be that there are several spaces at the end, i.e. third item then I think would be '' (empty str)?

Exactly !!

The .txt file that I use with input was generated from a huge .csv file which has more than 13 millions lines and about 30 columns. Some fields in .csv file are empty. So it's probably that my .txt file has empty str in some lines in the third column.

Any ideas about how I can solve this question? Thanks.
Reply
#7
strip '\n' from row_list before split or simply check that row_list[2] is '\n'
Reply
#8
#!/usr/bin/env python3

with open('cleancol.txt', "r") as fh, open('cleancol2.txt', "w") as sh:
    for row in fh:
        row_list = row.split(' ')
        if len(row_list) < 3:
            print(row, file=sh, end='')
results:
Output:
-8.17644273722222 54.4511528463889 -8.176458385 54.4512067233333 -8.17648941 54.4512515941667
The extra line in results created by forum markdown, not in file!
Reply
#9
The thirth column can be on of these - '\t\r\x0b\x0c'. row_list.strip() should fix it
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Can i clean this code ? BSDevo 8 941 Oct-28-2023, 05:50 PM
Last Post: BSDevo
  Clean Up Script rotw121 2 1,008 May-25-2022, 03:24 PM
Last Post: rotw121
  Modify values in XML file by data from text file (without parsing) Paqqno 2 1,657 Apr-13-2022, 06:02 AM
Last Post: Paqqno
  Overwrite values in XML file with values from another XML file Paqqno 5 3,302 Apr-01-2022, 11:33 PM
Last Post: Larz60+
  How to split file by same values from column from imported CSV file? Paqqno 5 2,775 Mar-24-2022, 05:25 PM
Last Post: Paqqno
  How to clean UART string Joni_Engr 4 2,477 Dec-03-2021, 05:58 PM
Last Post: deanhystad
  Printing x values from an csv file hobbyist 7 3,965 Mar-10-2021, 02:00 PM
Last Post: hobbyist
  How to generate rows based on values in a column to fill missing values codesmatter 1 2,124 Oct-31-2020, 12:05 AM
Last Post: Larz60+
  Dropping rows with missing values NewBeie 2 2,367 Jul-27-2020, 06:29 AM
Last Post: NewBeie
  How to clean session mqtt SayHiii 0 1,998 Dec-09-2019, 07:56 AM
Last Post: SayHiii

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020