Jul-11-2019, 05:11 PM
I am looking for a python program solution to this challenge:
Given a list of candidate demographics for numerous candidates, we want to be able to group the data by candidates with the same name. The demographics provided are (in order): Candidate ID, Candidate Name, Candidate Sex, Candidate Date Of Birth. For example, here's a sample input:
ID1,BROWN^JAMES,F,19890224
ID2,WILLIAMS^RORY,M,19881102
ID3,BROWN^JAMES,F,19890224
ID4,BROWN^JAMES,F,20010911
The expected output is:
0:
ID1,BROWN^JAMES,F,19890224
ID3,BROWN^JAMES,F,19890224
ID4,BROWN^JAMES,F,20010911
1:
ID2,WILLIAMS^RORY,M,19881102
Input
The program should accept a file as a parameter. The Candidate demographics fields are comma delimited, with newlines being used to designate new Candidates.
CANDIDATE ID, CANDIDATE NAME, CANDIDATE SEX, CANDIDATE DATE OF BIRTH
The format of the Candidate name is as follows:
LAST NAME^FIRST NAME^MIDDLE NAME
The middle name component is optional and may be omitted, but last and first name will always be present. We should consider Candidates with the same first and last name to the grouped together, even if the middle names don't match. Matches should also be case insensitive. So for the following input:
ID1,CLARA^OSWALD,F,19890224
ID2,CLARA^oswald^COLEMAN,F,19890224
the expected output would group these two together:
0:
ID1,CLARA^OSWALD,F,19890224
ID2,CLARA^oswald^COLEMAN,F,19890224
Output
A grouping of all the Candidates based on the first and last name of the Candidate. For each group, the output should look as follows:
N:
CANDIDATE ID, CANDIDATE NAME, CANDIDATE SEX, CANDIDATE DATE OF BIRTH (of match #1)
CANDIDATE ID, CANDIDATE NAME, CANDIDATE SEX, CANDIDATE DATE OF BIRTH (of match #2)
...
Where N is just incremented for each group. The output should be printed to standard out. The groups can be outputted in any order.
Complete Example
Input:
ID1,BROWN^JAMES,F,19890224
ID2,WILLIAMS^RORY,M,19881102
ID3,BROWN^JAMES,F,19890224
ID4,CLARA^OSWALD,F,19890224
ID5,BROWN^JAMES,F,20010911
ID6,CLAR^OSWALD,F,19890224
ID7,BROWN^AMELIA,F,20010911
ID8,CLARA^oswald,F,19890224
ID9,TYLER^ROSE,F,20000101
ID10,NOBLE^DONNA,F,19780405
ID11,TYLER^ROSE,F,20000101
ID12,NOBLE^DONN,F,19780405
ID13,TYLER^ROSE,F,20000102
ID14,CLARA^OSWALD^COLEMAN,F,19890224
Output
0:
ID1,BROWN^JAMES,F,19890224
ID3,BROWN^JAMES,F,19890224
ID5,BROWN^JAMES,F,20010911
1:
ID2,WILLIAMS^RORY,M,19881102
2:
ID4,CLARA^OSWALD,F,19890224
ID8,CLARA^oswald,F,19890224
ID14,CLARA^OSWALD^COLEMAN,F,19890224
3:
ID6,CLAR^OSWALD,F,19890224
4:
ID7,BROWN^AMELIA,F,20010911
5:
ID9,TYLER^ROSE,F,20000101
ID11,TYLER^ROSE,F,20000101
ID13,TYLER^ROSE,F,20000102
6:
ID10,NOBLE^DONNA,F,19780405
7:
ID12,NOBLE^DONN,F,19780405
Given a list of candidate demographics for numerous candidates, we want to be able to group the data by candidates with the same name. The demographics provided are (in order): Candidate ID, Candidate Name, Candidate Sex, Candidate Date Of Birth. For example, here's a sample input:
ID1,BROWN^JAMES,F,19890224
ID2,WILLIAMS^RORY,M,19881102
ID3,BROWN^JAMES,F,19890224
ID4,BROWN^JAMES,F,20010911
The expected output is:
0:
ID1,BROWN^JAMES,F,19890224
ID3,BROWN^JAMES,F,19890224
ID4,BROWN^JAMES,F,20010911
1:
ID2,WILLIAMS^RORY,M,19881102
Input
The program should accept a file as a parameter. The Candidate demographics fields are comma delimited, with newlines being used to designate new Candidates.
CANDIDATE ID, CANDIDATE NAME, CANDIDATE SEX, CANDIDATE DATE OF BIRTH
The format of the Candidate name is as follows:
LAST NAME^FIRST NAME^MIDDLE NAME
The middle name component is optional and may be omitted, but last and first name will always be present. We should consider Candidates with the same first and last name to the grouped together, even if the middle names don't match. Matches should also be case insensitive. So for the following input:
ID1,CLARA^OSWALD,F,19890224
ID2,CLARA^oswald^COLEMAN,F,19890224
the expected output would group these two together:
0:
ID1,CLARA^OSWALD,F,19890224
ID2,CLARA^oswald^COLEMAN,F,19890224
Output
A grouping of all the Candidates based on the first and last name of the Candidate. For each group, the output should look as follows:
N:
CANDIDATE ID, CANDIDATE NAME, CANDIDATE SEX, CANDIDATE DATE OF BIRTH (of match #1)
CANDIDATE ID, CANDIDATE NAME, CANDIDATE SEX, CANDIDATE DATE OF BIRTH (of match #2)
...
Where N is just incremented for each group. The output should be printed to standard out. The groups can be outputted in any order.
Complete Example
Input:
ID1,BROWN^JAMES,F,19890224
ID2,WILLIAMS^RORY,M,19881102
ID3,BROWN^JAMES,F,19890224
ID4,CLARA^OSWALD,F,19890224
ID5,BROWN^JAMES,F,20010911
ID6,CLAR^OSWALD,F,19890224
ID7,BROWN^AMELIA,F,20010911
ID8,CLARA^oswald,F,19890224
ID9,TYLER^ROSE,F,20000101
ID10,NOBLE^DONNA,F,19780405
ID11,TYLER^ROSE,F,20000101
ID12,NOBLE^DONN,F,19780405
ID13,TYLER^ROSE,F,20000102
ID14,CLARA^OSWALD^COLEMAN,F,19890224
Output
0:
ID1,BROWN^JAMES,F,19890224
ID3,BROWN^JAMES,F,19890224
ID5,BROWN^JAMES,F,20010911
1:
ID2,WILLIAMS^RORY,M,19881102
2:
ID4,CLARA^OSWALD,F,19890224
ID8,CLARA^oswald,F,19890224
ID14,CLARA^OSWALD^COLEMAN,F,19890224
3:
ID6,CLAR^OSWALD,F,19890224
4:
ID7,BROWN^AMELIA,F,20010911
5:
ID9,TYLER^ROSE,F,20000101
ID11,TYLER^ROSE,F,20000101
ID13,TYLER^ROSE,F,20000102
6:
ID10,NOBLE^DONNA,F,19780405
7:
ID12,NOBLE^DONN,F,19780405