Python Forum
Apply textual data cleaning to several CSV files
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Apply textual data cleaning to several CSV files
#1
I need to perform a textual analysis that includes several speeches. The speeches were transcribed (using OCR) from several PDFs files into CSVs files. Each CSV file contains a column titled speech, with several speeches from different speakers (one speaker, one row). I wrote a function to "clean" a little the most common shortfalls of the OCR. I applied this function to a single files and it does the job. Therefore, I am now trying to apply this function to all CSVs files. However, I keep getting the error "TypeError: expected string or bytes-like object". However, when I apply the code to a single file it does work, so I am stuck...Can someone help me? Any suggestion is appreciated.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Is it possible to extract 1 or 2 bits of data from MS project files? cubangt 8 937 Feb-16-2024, 12:02 AM
Last Post: deanhystad
  Cleaning my code to make it more efficient BSDevo 13 1,273 Sep-27-2023, 10:39 PM
Last Post: BSDevo
  script to calculate data in csv-files ledgreve 0 1,054 May-19-2023, 07:24 AM
Last Post: ledgreve
  SQL Alchemy help to extract sql data into csv files mg24 1 1,674 Sep-30-2022, 04:43 PM
Last Post: Larz60+
  Including data files in a package ChrisOfBristol 4 2,462 Oct-27-2021, 04:14 PM
Last Post: ChrisOfBristol
  [SOLVED] Why does regex fail cleaning line? Winfried 5 2,407 Aug-22-2021, 06:59 PM
Last Post: Winfried
  Plotting sum of data files using simple code Laplace12 3 2,992 Jun-16-2021, 02:06 PM
Last Post: BashBedlam
  IF statement to apply at each date illmattic 2 2,598 Apr-08-2021, 12:31 PM
Last Post: illmattic
  How do use data from csv files as variables? JUSS1K 1 2,092 Oct-25-2020, 08:31 PM
Last Post: GOTO10
  How to read multiple csv files and merge data rajeshE 0 1,917 Mar-28-2020, 04:01 PM
Last Post: rajeshE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020