Python Forum
Apply textual data cleaning to several CSV files
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Apply textual data cleaning to several CSV files
I need to perform a textual analysis that includes several speeches. The speeches were transcribed (using OCR) from several PDFs files into CSVs files. Each CSV file contains a column titled speech, with several speeches from different speakers (one speaker, one row). I wrote a function to "clean" a little the most common shortfalls of the OCR. I applied this function to a single files and it does the job. Therefore, I am now trying to apply this function to all CSVs files. However, I keep getting the error "TypeError: expected string or bytes-like object". However, when I apply the code to a single file it does work, so I am stuck...Can someone help me? Any suggestion is appreciated.

Possibly Related Threads…
Thread Author Replies Views Last Post
  SQL Alchemy help to extract sql data into csv files mg24 1 356 Sep-30-2022, 04:43 PM
Last Post: Larz60+
  Including data files in a package ChrisOfBristol 4 1,491 Oct-27-2021, 04:14 PM
Last Post: ChrisOfBristol
  [SOLVED] Why does regex fail cleaning line? Winfried 5 1,700 Aug-22-2021, 06:59 PM
Last Post: Winfried
  Plotting sum of data files using simple code Laplace12 3 2,329 Jun-16-2021, 02:06 PM
Last Post: BashBedlam
  IF statement to apply at each date illmattic 2 2,032 Apr-08-2021, 12:31 PM
Last Post: illmattic
  How do use data from csv files as variables? JUSS1K 1 1,463 Oct-25-2020, 08:31 PM
Last Post: GOTO10
  How to read multiple csv files and merge data rajeshE 0 1,470 Mar-28-2020, 04:01 PM
Last Post: rajeshE
  How to apply VLookup formula jonzee 2 2,896 Jan-12-2020, 04:16 PM
Last Post: Clunk_Head
  Issue in .apply function fullstop 0 1,168 Dec-17-2019, 01:29 PM
Last Post: fullstop
  Beginner needing advice with data files JFI2019 2 1,649 Nov-06-2019, 04:56 PM
Last Post: JFI2019

Forum Jump:

User Panel Messages

Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020