String handling

YoungGrassHopper · Sep-15-2019, 12:33 PM

(Sep-15-2019, 12:21 PM)jefsummers Wrote: This may nudge you in the right direction. Just doing it for one entry, but you know how to loop now to repeat this...
str = "Orville Wright 21 July 1988"
strlist = str.split()
print (strlist)
name = strlist[0]+" "+strlist[1]
print(f"Name:\n{name}")
dob = strlist[2]+" "+strlist[3]+" "+strlist[4]
print("You take it from here")

Thanks for the reply jefsummers I cant stress enough how much the support is appreciated, let me go and try fix my flawed attempt, I'll drop a short reply of my code once Im done.

**perfringo** · Sep-15-2019, 01:05 PM

Word of warning: if your professors are any good (or you want to learn life-like scenarios) then they have mislead you to think that all names are two words. Real test file will probably contain something like: 'Guido van Rossum 31 January 1956' and code which relies on split on whitespaces will fail miserably.

I would look for index of first numeric character of string and make two slices based on that.

YoungGrassHopper · (This post was last modified: Sep-15-2019, 01:21 PM by YoungGrassHopper.)

(Sep-15-2019, 12:21 PM)jefsummers Wrote: This may nudge you in the right direction. Just doing it for one entry, but you know how to loop now to repeat this...
str = "Orville Wright 21 July 1988"
strlist = str.split()
print (strlist)
name = strlist[0]+" "+strlist[1]
print(f"Name:\n{name}")
dob = strlist[2]+" "+strlist[3]+" "+strlist[4]
print("You take it from here")

Ok so Im super close but the problem I am experiencing now with the loop is that when I try to split each line in the doc it puts a bracket at the very start of all the names and dob's, splits it and one bracket at the very end of the whole list so it messes up the indexing and consequently it only prints the first names details.. I basically need to tell the split function to please just split each line separately but the bugger has a mind of its own.

for line in doc:
    doclist = doc.split()
    names = doclist[0]+" "+doclist[1]
    dob = doclist[2]+" "+doclist[3]+" "+doclist[4]
    print(f"Name:\n{names}")
    print(f"Birthdate:\n{dob}")
    break

f.close()

(Sep-15-2019, 01:05 PM)perfringo Wrote: Word of warning: if your professors are any good (or you want to learn life-like scenarios) then they have mislead you to think that all names are two words. Real test file will probably contain something like: 'Guido van Rossum 31 January 1956' and code which relies on split on whitespaces will fail miserably.

I would look for index of first numeric character of string and make two slices based on that.

This specific txt file they supplied with the task only has names which are two words, I thought of the same thing and checked but they all are just 2 words per name.

You have a very valid point there and a sound plan as the number of words per name wont matter then. I will just need to go do some research on how to "look" refer to the first numeric char in a line as I have no idea how to do that. This course I am doing, its like they teach me how to operate on balls and then they chase me into the ER to go and to a brain surgery .

They don't supply me with all the reading material before they give me the task and I have deadlines on all of them so its stressful to find the info in time, understand it and apply it to the task at hand.

**perfringo** · (This post was last modified: Sep-15-2019, 05:38 PM by perfringo.)

There is always couple of minutes to spare and use them for thinking. Majority of assignments are solvable with 'pure' thinking. Some potentially useful tidbits:

- separate what from how - define what you want to do first and only after that start coding.

- use spoken language for defining what you want to do

- decompose tasks into subtasks

What for current assignment:

- I want get names and birthdays from rows
- I want to print out name and birthday in specific formatting

What: Get names and birthdays

Using just common sense analyse rows. The pattern which should emerge is that row must be split into two, starting from first decimal encountered (as discussed earlier, there can be more than two names therefore whitespace splitting can be error prone). This observation has nothing to do with coding, it's about finding general pattern.

What: Split line onto two parts on first decimal

How? We need index of the first decimal number and then slice string on this index.

Getting index is easy enough:

>>> s = 'a1b2c3'
>>> s.index('1')
1
>>> s.find('2')
3

However, we must find occurrence any decimal, not specific one. How? Quite logical step will be 'iterate over string character by character and when decimal is encountered return it's index':

>>> [s.index(char) for char in s if char.isdecimal()]  # alternatively char in '0123456789'
[1, 3, 5]

We have all indices but actually need only first. We could 'create list of decimal indexes and get first index'

>>> [s.index(char) for char in s if char.isdecimal()][0]
1

or we could 'create generator of decimal indices and take first':

>>> index = next(s.index(char) for char in s if char.isdecimal())
>>> index
1

Now we know the index where to split. Let's test:

>>> n = 'Guido van Rossum 31 January 1956'
>>> index = next(n.index(char) for char in n if char.isdecimal())
>>> name, birthday = n[:index], n[index:]
>>> name                                                                  
'Guido van Rossum '       # observe space at end, to get rid of it one can use n[:index].strip() on row #3
>>> birthday                                                              
'31 January 1956'

What: print out in specific formatting

When we have name and birthday it's easy to construct string in required formatting:

>>> s = 'Guido van Rossum 31 January 1956', 'Orville Wright 21 July 1988'
>>> for row in s: 
...     index = next(row.index(char) for char in row if char.isdecimal()) 
...     name, birthday = row[:index], row[index:] 
...     print(f'Name:\n{name}\n\nBirthday:\n{birthday}\n') 
...                                                                              
Name:
Guido van Rossum 

Birthday:
31 January 1956

Name:
Orville Wright 

Birthday:
21 July 1988
                         # observe newline at end

YoungGrassHopper · (This post was last modified: Sep-15-2019, 08:29 PM by YoungGrassHopper.)

(Sep-15-2019, 05:38 PM)perfringo Wrote: There is always couple of minutes to spare and use them for thinking. Majority of assignments are solvable with 'pure' thinking. Some potentially useful tidbits:

- separate what from how - define what you want to do first and only after that start coding.

- use spoken language for defining what you want to do

- decompose tasks into subtasks

What for current assignment:

- I want get names and birthdays from rows
- I want to print out name and birthday in specific formatting

What: Get names and birthdays

Using just common sense analyse rows. The pattern which should emerge is that row must be split into two, starting from first decimal encountered (as discussed earlier, there can be more than two names therefore whitespace splitting can be error prone). This observation has nothing to do with coding, it's about finding general pattern.

What: Split line onto two parts on first decimal

How? We need index of the first decimal number and then slice string on this index.

Getting index is easy enough:
>>> s = 'a1b2c3'
>>> s.index('1')
1
>>> s.find('2')
3
However, we must find occurrence any decimal, not specific one. How? Quite logical step will be 'iterate over string character by character and when decimal is encountered return it's index':
>>> [s.index(char) for char in s if char.isdecimal()]  # alternatively char in '0123456789'
[1, 3, 5]
We have all indices but actually need only first. We could 'create list of decimal indexes and get first index'
>>> [s.index(char) for char in s if char.isdecimal()][0]
1
or we could 'create generator of decimal indices and take first':
>>> index = next(s.index(char) for char in s if char.isdecimal())
>>> index
1
Now we know the index where to split. Let's test:
>>> n = 'Guido van Rossum 31 January 1956'
>>> index = next(n.index(char) for char in n if char.isdecimal())
>>> name, birthday = n[:index], n[index:]
>>> name                                                                  
'Guido van Rossum '       # observe space at end, to get rid of it one can use n[:index].strip() on row #3
>>> birthday                                                              
'31 January 1956'
What: print out in specific formatting

When we have name and birthday it's easy to construct string in required formatting:
>>> s = 'Guido van Rossum 31 January 1956', 'Orville Wright 21 July 1988'
>>> for row in s: 
...     index = next(row.index(char) for char in row if char.isdecimal()) 
...     name, birthday = row[:index], row[index:] 
...     print(f'Name:\n{name}\n\nBirthday:\n{birthday}\n') 
...                                                                              
Name:
Guido van Rossum 

Birthday:
31 January 1956

Name:
Orville Wright 

Birthday:
21 July 1988
                         # observe newline at end

Holy snaps perfringo I almost feel like I am in debt to you, for you having to spend so much effort explaining this all to me. I absolutely agree with your advice , think, write pseudo code and lay it all out in logical terms before starting to code. Problem I find currently is that I simply do not have all the knowledge and tools at hand to approach and solve the tasks I need to do and these tasks are getting ever more complex as one would expect in such a boot camp.

I think its going to take time and lots of research and practice for to get where I need to be and in the mean time I just need to stay strong and push trough.

EDIT: I could not run the code at first but sorted it. Thanks again for your thorough explanation deeply appreciated perfringo

newbieAuggie2019 · Sep-15-2019, 09:20 PM

(Sep-15-2019, 01:12 PM)YoungGrassHopper Wrote: ...they teach me how to operate on balls and then they chase me into the ER to go and do a brain surgery.

Sorry! I just couldn't resist it.
Wink

All the best,

YoungGrassHopper · Sep-15-2019, 10:37 PM

I feel terrible for not being able to create this code myself, super jealous of your skills haha but thanks Perfringo you saved me on this one big time. I got it running but I am getting a funny "StopIteration" error first time I encounter it but it runs. Anyways thanks I think we can regard this as solved.

jefsummers · Sep-16-2019, 12:47 AM

Is there a blank line at the end of the data?

YoungGrassHopper · Sep-16-2019, 04:45 AM

(Sep-16-2019, 12:47 AM)jefsummers Wrote: Is there a blank line at the end of the data?

Hi jefsummers, if I understand you correctly, no it seems like there wasn't. And I'm not sure if what I did is suppose to make a difference but I manually created one now underneath the names etc. and saved the DOB data file , ran its again to check if it makes any difference but it doesn't seem to still getting the same "StopIteration"

**perfringo** · (This post was last modified: Sep-16-2019, 05:11 AM by perfringo.)

EDITED: initial code contained error. Blank lines contain newline at end, so line must be stripped otherwise it will be truthy (contains newline). Therefore row.strip() should be used. Below code is fixed.

............

From screenshot I observe that there is blank line at the end of file. Problem can be easily mitigated by little defensive code - 'check whether line is empty'. Also I suggest to open files using 'with', among other goodies there is no need to close file - Python does it for you.

with open('filename.txt', 'r') as f:
    for row in f:
        if row.strip():
            # the rest of code

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	python exception handling handling .... with traceback	mg24	3	1,289	Nov-09-2022, 07:29 PM Last Post: Gribouillis

String handling

User Panel Messages

Announcements