Jan-12-2018, 12:00 AM
(This post was last modified: Jan-12-2018, 12:00 AM by hereathome.)
The Canadian Census offers some interesting csv files that are legally free to download and can lead to some fun doing data analysis. Sadly, these files are crammed with what appear to be purposeless Unicode characters, all of them used as padding in various cells without contributing to the data in any way. Is there a reasonably efficient algorithm to strip the Unicode characters from csv files that contain millions of characters?
EDIT: the longest of these files is more than 193,000K and here's the link to all of them: http://www12.statcan.gc.ca/census-recens...cfm?Lang=E
EDIT: the longest of these files is more than 193,000K and here's the link to all of them: http://www12.statcan.gc.ca/census-recens...cfm?Lang=E