Python Forum
need algorithm to strip non-ascii characters from LONG csv file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
need algorithm to strip non-ascii characters from LONG csv file
#1
The Canadian Census offers some interesting csv files that are legally free to download and can lead to some fun doing data analysis. Sadly, these files are crammed with what appear to be purposeless Unicode characters, all of them used as padding in various cells without contributing to the data in any way. Is there a reasonably efficient algorithm to strip the Unicode characters from csv files that contain millions of characters?

EDIT: the longest of these files is more than 193,000K and here's the link to all of them: http://www12.statcan.gc.ca/census-recens...cfm?Lang=E
Reply
#2
An example? I have downloaded one of these and didn't see any unusual characters.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#3
Same here. What are you using to read the files where you see the corruption?
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
Reply
#4
Sorry, but it was a minor stupid mistake in my code. There are no unusual characters in those files. Maybe this post could be deleted or archived. I apologize for wasting everyone's time.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Decoding lat/long in file name johnmcd 4 349 Mar-22-2024, 11:51 AM
Last Post: johnmcd
  extract only text strip byte array Pir8Radio 7 2,924 Nov-29-2022, 10:24 PM
Last Post: Pir8Radio
Smile please help me remove error for string.strip() jamie_01 3 1,194 Oct-14-2022, 07:48 AM
Last Post: Pedroski55
  Can't strip extra characters from Data Canflyguy 7 1,861 Jan-10-2022, 02:16 PM
Last Post: Canflyguy
  strip() pprod 8 3,451 Feb-16-2021, 01:11 PM
Last Post: buran
  Split Characters As Lines in File quest_ 3 2,509 Dec-28-2020, 09:31 AM
Last Post: quest_
  get two characters, count and print from a .txt file Pleiades 9 3,354 Oct-05-2020, 09:22 AM
Last Post: perfringo
  Reading integers from a file; the problem may be the newline characters JRWoodwardMSW 2 1,965 Jul-14-2020, 02:27 AM
Last Post: bowlofred
  Remove escape characters / Unicode characters from string DreamingInsanity 5 13,677 May-15-2020, 01:37 PM
Last Post: snippsat
  How to Remove Non-ASCII Characters But Leave Line Breaks In Place? bmccollum 4 4,293 Apr-09-2020, 07:59 PM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020