Python Forum
Sorting data with pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Sorting data with pandas
#1
Question 
I am working on an inventory software which use Optical Character Recognition (EasyOCR) to can a document. Now I got the problem of cleaning/sorting the data (using pandas). I want a csv file like this:

Quote:Article Number: Name:
56451748468434 Shoe
24564165165145 Boat
...


The data I got looks like this:
Quote:00605555
Retail Apple I2W USB Power Adapler
00605558
Lightning t0 USB Cable (2 m)
00605613
Apple Lightn t0
Smm Headphone
00805614
Apple iPhone Lightning Dock Black
0C605615
Apple EarPods wilh Lighining Con;
00605774
Huawei GigaCube LTE CPE Fenstera
00605806
Google Home
00605834
Belkin BOOSTUP Wireless Charger
00605872
Google Hame Mini

I tried:
data = pd.DataFrame(data['1'].values.reshape(-1, 2), columns=['Artikel', 'AN'])
But then my output looks like this:
Output:
Artikel AN 0 Retail Apple I2W USB Power Adapler 00605558 1 Lightning t0 USB Cable (2 m) 00605613 2 Apple Lightn t0 Smm Headphone 3 00805614 Apple iPhone Lightning Dock Black 4 0C605615 Apple EarPods wilh Lighining Con; 5


Any idea how I can clean/sort the data properly (only the numbers on one side and the name on the other side)?

Thank you! :)
Reply
#2
Looking at the data, first 6 lines alternate between number and description. Then you get 2 lines of description which is going to be problematic. If you try to filter by if numbers only it is the code, then one of the codes has a "C" in it which again is a problem.
You need to be able to describe how the program is to distinguish the columns, then you can try to program it.
Reply
#3
(Nov-15-2021, 07:25 PM)jefsummers Wrote: Looking at the data, first 6 lines alternate between number and description. Then you get 2 lines of description which is going to be problematic. If you try to filter by if numbers only it is the code, then one of the codes has a "C" in it which again is a problem.
You need to be able to describe how the program is to distinguish the columns, then you can try to program it.
Yeah, thank you, I fixed the problem with the 'C'. But how can I program it to sort it by numbers and on the other side all the text under it till a new number comes
Reply
#4
Not sure what the data looks like exactly when you import, but I suggest using a try...except block, where you use int() on the string. If it passes, it's a number, if it throws the exception it is not.
Reply
#5
You should show how you are importing the data to help answer the question.

I would recommend parsing this data into an actual csv format before reading into pandas:

import csv


headers = ['AN', 'Artikel']
with open('data.txt', 'r') as in_data, open('out.csv', 'w', newline='') as out_csv:
    csv_writer = csv.writer(out_csv)

    i = 0
    row = []

    csv_writer.writerow(headers)

    # make sure your data ends in a new line or you won't import the last line
    for line in in_data.readlines():
        if i % 2 == 0 and i != 0:
            csv_writer.writerow(row)
            row.clear()
        
        i += 1
        row.append(line.strip())

    csv_writer.writerow(row)
    row.clear()
This converts this:
12345S
Item 1
4152DS
Item 2
15190A
Item 3


To this:

AN,Artikel
12345S,Item 1
4152DS,Item 2
15190A,Item 3


Which will be read by pandas cleaner.You should show how you are importing the data to help answer the question.

Which will be read by pandas cleaner.

Edit: To import and sort:

import pandas as pd
df = pd.read_csv('out.csv')
df.sort_values(["AN"], inplace=True)
print(df.head())
Output:
       AN Artikel
0  12345S  Item 1
2  15190A  Item 3
1  4152DS  Item 2
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Grouping in pandas/multi-index data frame Aleqsie 3 607 Jan-06-2024, 03:55 PM
Last Post: deanhystad
  Data Sorting and filtering(From an Excel File) PY_ALM 0 1,012 Jan-09-2023, 08:14 PM
Last Post: PY_ALM
  Sorting data by specific variables using argparse Bearinabox 5 1,369 Jan-01-2023, 07:44 PM
Last Post: Bearinabox
Smile How to further boost the data read write speed using pandas tjk9501 1 1,230 Nov-14-2022, 01:46 PM
Last Post: jefsummers
Thumbs Up can't access data from URL in pandas/jupyter notebook aaanoushka 1 1,830 Feb-13-2022, 01:19 PM
Last Post: jefsummers
  Pandas Data frame column condition check based on length of the value aditi06 1 2,655 Jul-28-2021, 11:08 AM
Last Post: jefsummers
  [Pandas] Write data to Excel with dot decimals manonB 1 5,775 May-05-2021, 05:28 PM
Last Post: ibreeden
  pandas.to_datetime: Combine data from 2 columns ju21878436312 1 2,421 Feb-20-2021, 08:25 PM
Last Post: perfringo
  pandas read_csv can't handle missing data mrdominikku 0 2,462 Jul-09-2020, 12:26 PM
Last Post: mrdominikku
  Pandas data frame creation from Kafka Topic vboppa 0 1,914 Jul-01-2020, 04:23 PM
Last Post: vboppa

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020