Python Forum
need algorithm to strip non-ascii characters from LONG csv file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
need algorithm to strip non-ascii characters from LONG csv file
#1
The Canadian Census offers some interesting csv files that are legally free to download and can lead to some fun doing data analysis. Sadly, these files are crammed with what appear to be purposeless Unicode characters, all of them used as padding in various cells without contributing to the data in any way. Is there a reasonably efficient algorithm to strip the Unicode characters from csv files that contain millions of characters?

EDIT: the longest of these files is more than 193,000K and here's the link to all of them: http://www12.statcan.gc.ca/census-recens...cfm?Lang=E
Reply
#2
An example? I have downloaded one of these and didn't see any unusual characters.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#3
Same here. What are you using to read the files where you see the corruption?
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
Reply
#4
Sorry, but it was a minor stupid mistake in my code. There are no unusual characters in those files. Maybe this post could be deleted or archived. I apologize for wasting everyone's time.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [solved] how to delete the 10 first lines of an ascii file paul18fr 7 1,666 Aug-07-2024, 08:18 PM
Last Post: Gribouillis
  using last 2 characters of a file name to identify it CAD79 5 1,698 Jul-12-2024, 02:09 PM
Last Post: deanhystad
  Reading an ASCII text file and parsing data... oradba4u 2 1,377 Jun-08-2024, 12:41 AM
Last Post: oradba4u
  Decoding lat/long in file name johnmcd 4 1,477 Mar-22-2024, 11:51 AM
Last Post: johnmcd
  extract only text strip byte array Pir8Radio 7 6,857 Nov-29-2022, 10:24 PM
Last Post: Pir8Radio
Smile please help me remove error for string.strip() jamie_01 3 1,935 Oct-14-2022, 07:48 AM
Last Post: Pedroski55
  Can't strip extra characters from Data Canflyguy 7 3,047 Jan-10-2022, 02:16 PM
Last Post: Canflyguy
  strip() pprod 8 5,086 Feb-16-2021, 01:11 PM
Last Post: buran
  Split Characters As Lines in File quest_ 3 3,365 Dec-28-2020, 09:31 AM
Last Post: quest_
  get two characters, count and print from a .txt file Pleiades 9 4,941 Oct-05-2020, 09:22 AM
Last Post: perfringo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020