Python Forum

Full Version: Split Column Text by Number of Characters
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello all,

Brand new to Python! I've tried search keywords for this solution but came up with unrelated examples with commas / delimiters. Any assistance would be greatly appreciated.

Loading from csv into a df,

Data set column = "Month of Year" - values (201801,201802,201803)
I would like to split up this column into a Month and a Year column ( Month = 01, Year = 2018 ).

In excel, this would be a text to column function, any idea on what the equivalent would be in python?

Thank you
I suspect that Excel 'text-to-columns' function doesn't split value to year and month without delimiter as well.

If somebody is brand new to Python it's not good to start with Pandas right away. Learning Python basics will serve you well in future.

There appears to be no delimiter to split value. However, it seems that there is pattern: first four represent year, next two represent month. If values coming from csv then by default they are strings and slicing can retrieve needed data:

>>> values = '201801,201802,201803'
>>> [(el[4:], el[:4]) for el in values.split(',')]
[('01', '2018'), ('02', '2018'), ('03', '2018')] 
To take @perfringo code a step further and make it datetime object.
>>> values = '201801,201802,201803'
>>> d = [(el[4:], el[:4]) for el in values.split(',')]
>>> d
[('01', '2018'), ('02', '2018'), ('03', '2018')]
>>> date_str = ['-'.join(i) for i in d]
>>> date_str
['01-2018', '02-2018', '03-2018']
>>> date_str[0]
'01-2018'
So a this point has a delimiter(-) and now can use strftime().
>>> from datetime import datetime, timedelta
>>> 
>>> d = date_str[0]
>>> d
'01-2018'
>>> my_date = datetime.strptime(d, '%m-%Y')
>>> my_date
datetime.datetime(2018, 1, 1, 0, 0)

>>> # So at this point is a datetime object
>>> type(my_date)
<class 'datetime.datetime'>
>>> my_date.year
2018
>>> my_date.month
1

>>> # 5 days from my_date
>>> my_date + timedelta(days=5)
datetime.datetime(2018, 1, 6, 0, 0)

# With print() don't show "repr" object info
>>> print(my_date + timedelta(days=5))
2018-01-06 00:00:00
Pandas do of course have own Time Series / Date functionality which has a lot functionally.
Just because it can be done: based on snippsat code one can develop following dense and 'nice' code:

>>> from datetime import datetime
>>> values = '201801,201802,201803'
>>> [datetime.strptime(f'{el[4:]}-{el[:4]}', '%m-%Y') for el in values.split(',')]
[datetime.datetime(2018, 1, 1, 0, 0),
 datetime.datetime(2018, 2, 1, 0, 0),
 datetime.datetime(2018, 3, 1, 0, 0)]
>>> [(date.month, date.year) for date in [datetime.strptime(f'{el[4:]}-{el[:4]}', '%m-%Y') for el in values.split(',')]]
[(1, 2018), (2, 2018), (3, 2018)]