need algorithm to strip non-ascii characters from LONG csv file

hereathome · (This post was last modified: Jan-12-2018, 12:00 AM by hereathome.)

The Canadian Census offers some interesting csv files that are legally free to download and can lead to some fun doing data analysis. Sadly, these files are crammed with what appear to be purposeless Unicode characters, all of them used as padding in various cells without contributing to the data in any way. Is there a reasonably efficient algorithm to strip the Unicode characters from csv files that contain millions of characters?

EDIT: the longest of these files is more than 193,000K and here's the link to all of them: http://www12.statcan.gc.ca/census-recens...cfm?Lang=E

wavic · Jan-12-2018, 12:23 AM

An example? I have downloaded one of these and didn't see any unusual characters.

***sparkz_alot*** · Jan-12-2018, 12:30 AM

Same here. What are you using to read the files where you see the corruption?

hereathome · Jan-12-2018, 02:04 AM

Sorry, but it was a minor stupid mistake in my code. There are no unusual characters in those files. Maybe this post could be deleted or archived. I apologize for wasting everyone's time.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Decoding lat/long in file name	johnmcd	4	349	Mar-22-2024, 11:51 AM Last Post: johnmcd
	extract only text strip byte array	Pir8Radio	7	2,924	Nov-29-2022, 10:24 PM Last Post: Pir8Radio
	please help me remove error for string.strip()	jamie_01	3	1,194	Oct-14-2022, 07:48 AM Last Post: Pedroski55
	Can't strip extra characters from Data	Canflyguy	7	1,861	Jan-10-2022, 02:16 PM Last Post: Canflyguy
	strip()	pprod	8	3,451	Feb-16-2021, 01:11 PM Last Post: buran
	Split Characters As Lines in File	quest_	3	2,509	Dec-28-2020, 09:31 AM Last Post: quest_
	get two characters, count and print from a .txt file	Pleiades	9	3,354	Oct-05-2020, 09:22 AM Last Post: perfringo
	Reading integers from a file; the problem may be the newline characters	JRWoodwardMSW	2	1,965	Jul-14-2020, 02:27 AM Last Post: bowlofred
	Remove escape characters / Unicode characters from string	DreamingInsanity	5	13,677	May-15-2020, 01:37 PM Last Post: snippsat
	How to Remove Non-ASCII Characters But Leave Line Breaks In Place?	bmccollum	4	4,293	Apr-09-2020, 07:59 PM Last Post: DeaD_EyE

need algorithm to strip non-ascii characters from LONG csv file

User Panel Messages

Announcements