Sorting data with pandas - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Sorting data with pandas (/thread-35545.html) |
Sorting data with pandas - TheZaind - Nov-15-2021 I am working on an inventory software which use Optical Character Recognition (EasyOCR) to can a document. Now I got the problem of cleaning/sorting the data (using pandas). I want a csv file like this: Quote:Article Number: Name: The data I got looks like this: Quote:00605555 I tried: data = pd.DataFrame(data['1'].values.reshape(-1, 2), columns=['Artikel', 'AN'])But then my output looks like this: Any idea how I can clean/sort the data properly (only the numbers on one side and the name on the other side)? Thank you! :) RE: Sorting data with pandas - jefsummers - Nov-15-2021 Looking at the data, first 6 lines alternate between number and description. Then you get 2 lines of description which is going to be problematic. If you try to filter by if numbers only it is the code, then one of the codes has a "C" in it which again is a problem. You need to be able to describe how the program is to distinguish the columns, then you can try to program it. RE: Sorting data with pandas - TheZaind - Nov-15-2021 (Nov-15-2021, 07:25 PM)jefsummers Wrote: Looking at the data, first 6 lines alternate between number and description. Then you get 2 lines of description which is going to be problematic. If you try to filter by if numbers only it is the code, then one of the codes has a "C" in it which again is a problem.Yeah, thank you, I fixed the problem with the 'C'. But how can I program it to sort it by numbers and on the other side all the text under it till a new number comes RE: Sorting data with pandas - jefsummers - Nov-17-2021 Not sure what the data looks like exactly when you import, but I suggest using a try...except block, where you use int() on the string. If it passes, it's a number, if it throws the exception it is not. RE: Sorting data with pandas - aserian - Nov-22-2021 You should show how you are importing the data to help answer the question. I would recommend parsing this data into an actual csv format before reading into pandas: import csv headers = ['AN', 'Artikel'] with open('data.txt', 'r') as in_data, open('out.csv', 'w', newline='') as out_csv: csv_writer = csv.writer(out_csv) i = 0 row = [] csv_writer.writerow(headers) # make sure your data ends in a new line or you won't import the last line for line in in_data.readlines(): if i % 2 == 0 and i != 0: csv_writer.writerow(row) row.clear() i += 1 row.append(line.strip()) csv_writer.writerow(row) row.clear()This converts this: 12345S Item 1 4152DS Item 2 15190A Item 3 To this: AN,Artikel 12345S,Item 1 4152DS,Item 2 15190A,Item 3 Which will be read by pandas cleaner.You should show how you are importing the data to help answer the question. Which will be read by pandas cleaner. Edit: To import and sort: import pandas as pd df = pd.read_csv('out.csv') df.sort_values(["AN"], inplace=True) print(df.head())Output: AN Artikel 0 12345S Item 1 2 15190A Item 3 1 4152DS Item 2 |