Python Forum

Full Version: Removing hyphens and adding zeros
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
What is the meaning of the following code from the Learn Data Analysis with Python book? Can someone elaborate on the first 10-12 lines. Why return s[-9] or in other words assign to ssn? Negative index? As I understand, first replacing hyphens with a space. Then, splitting at the space and rejoin? If number of digits less than 9 and not 'missing' then .. why add the zeros? Then what next?

The ssns are:
ssns = ['867-53-0909','333-22-4444','123-12-1234',
'777-93-9311','123-12-1423']

def right(s, amount):
    return s[-amount]
def standardize_ssn(ssn):
    try:
      ssn = ssn.replace("-","")
      ssn = "".join(ssn.split())
      if len(ssn)<9 and ssn != 'Missing':
         ssn="000000000" + ssn
         ssn=right(ssn,9)
    except:
      pass
    return ssn
df.ssn = df.ssn.apply(standardize_ssn)
df
ThereĀ“s a colon missing, either in the book or you overlooked it.
def right(s, amount):
    return s[-amount:]
if the snn is less than 9 digits it is set to nine zeros plus the given digits and then the last nine digits are returned
So in other words zeros were added to the start of the ssn
There is string method zfill for filling in zeros and if the objective is to have standard lenght (9) and fill shorter values with zeros and removing hyphens one can simply do:

>>> ssns = ['867-53-0909','333-22-4444','123-12-1234', '777-93-9311','123-12-1423', '1-2-3', 'Missing', '45-67-78']
>>> [''.join(ssn.split('-')).zfill(9) for ssn in ssns if ssn != 'Missing']
['867530909',
 '333224444',
 '123121234',
 '777939311',
 '123121423',
 '000000123',
 '000456778']
Thank you Thomas and Perfringo. I just realized the colon after working through the code step by step. I figured out the working of the code as I did it step by step running the code after each line. Then logged in and noticed your reply as well. And the colon is actually missing in the book. Well problem solved.

Sure Buran, will format with tags in future. I'm going to try to see if it's working:

for i in range(10):
      print i