Python Forum

Full Version: Filter only highest version of list with alpanumeric names
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello everyone,

I've have a CSV file with the following documentnames with version (example):

name1.A01
name1.A02
name1.A03
name2.A01
name2.A02
name2.A03
name2.A04
name3.A01
name3.A02
name3.A04
(name3.A03 is not in the list by purpose)

New file should contain:

name1.A03
name2.A04
name3.A04

Anyone an idea how to solve?
Probably, but what have you thought about?
(Aug-30-2019, 06:58 PM)Haasje Wrote: [ -> ]Anyone an idea how to solve?

If this is the full description of the task then I would fire up my favorite text editor, enter three lines, save the file and go on with my life.
(Aug-30-2019, 07:27 PM)perfringo Wrote: [ -> ]
(Aug-30-2019, 06:58 PM)Haasje Wrote: [ -> ]Anyone an idea how to solve?
If this is the full description of the task then I would fire up my favorite text editor, enter three lines, save the file and go on with my life.

The real file contains 8000 rows ☹️

(Aug-30-2019, 07:00 PM)ndc85430 Wrote: [ -> ]Probably, but what have you thought about?

Hi thanks for your response. I have tried tot solve this with Excel but no success. I got the advice to try this using Python (limited experience with Python).
In order to write any code one must understand (1) what we have (2) what we want.

As I understand we have file with 8000 rows. We want extract some of these rows. But:

- based on what we want to extract rows (largest version number?)?
- are names sorted by versions as in example (largest version is aways the last)?
(Aug-31-2019, 06:15 AM)perfringo Wrote: [ -> ]In order to write any code one must understand (1) what we have (2) what we want. As I understand we have file with 8000 rows. We want extract some of these rows. But: - based on what we want to extract rows (largest version number?)? - are names sorted by versions as in example (largest version is aways the last)?

1) I want to keep for every document name e.g. name1.Axx only the highest documentname.Axx version. So every documentname is a set of versions A01-Axx but I need only the latest version (combination of documentname.Axx)
2) Names are always sorted as in example. (but not every documentname has always in complete sequence so could be version A01,A02,A04 but also A01, A02 and A05 only).

Thanks in advance for your help.
Do I understand correctly that there is only one character in versioning and it’s not changing? And within one name versions are sorted.

If so the task is quite simple - iterate over rows and take every row before name change. No need to check version numbers as they are sorted.