Python Forum
Splitting values in column in a pandas dataframe based on a condition
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Splitting values in column in a pandas dataframe based on a condition
#1
Hi All,

I have been trying to split data separated by a comma in the 'name' column into two columns, however not all of the rows in this column have commas.

Initially I tried to use the below but I think it tries to split every column even ones without colums and so it fails:

I think I need a conditional statement that can apply the splitting to just the cells in the names coloumn with commas in, but I couldn't work out how to do this.

Any help would be much appreciated Wall Big Grin Wall

df1['new name'], df1['category'] = zip(*df1['name'].map(lambda x: x.split(',')))
Here is the full query that I want to apply the above to:

import requests
from lxml import etree
toDate = "2018-04-25"
fromDate = "2018-04-25"
dateType = "gasday"

def getXML():
    url="http://marketinformation.natgrid.co.uk/MIPIws-public/public/publicwebservice.asmx"
    headers = {'content-type': 'application/soap+xml; charset=utf-8'}
    body ="""<soap12:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap12="http://www.w3.org/2003/05/soap-envelope">
        <soap12:Body>
            <GetPublicationDataWM xmlns="http://www.NationalGrid.com/MIPI/">
                <reqObject>
                    <LatestFlag>Y</LatestFlag>
                    <ApplicableForFlag>Y</ApplicableForFlag>
                    <ToDate>%s</ToDate>
                    <FromDate>%s</FromDate>
                    <DateType>%s</DateType>
                    <PublicationObjectNameList>
                    <string>Nominations, Aggregate Imbalance</string>
                    <string>Nominations, Aggregate Prevailing Nomination, Boiloff to LDZ Total</string>
                    <string>Nominations, Aggregate Prevailing Nomination, Interconnector Entry Total</string>
                    <string>Nominations, Aggregate Prevailing Nomination, Interconnector Exit Total</string>
                    <string>Nominations, Aggregate Prevailing Nomination, LDZ Direct Total</string>
                    <string>Nominations, Aggregate Prevailing Nomination, LNG Importation Total</string>
                    <string>Nominations, Aggregate Prevailing Nomination, NBP Trade Entry Total</string>
                    <string>Nominations, Aggregate Prevailing Nomination, NBP Trade Exit Total</string>
                    <string>Nominations, Aggregate Prevailing Nomination, Non Daily Meters Total</string>
                    <string>Nominations, Aggregate Prevailing Nomination, NTS Industrial Total</string>
                    <string>Nominations, Aggregate Prevailing Nomination, NTS Powerstation Total</string>
                    <string>Nominations, Aggregate Prevailing Nomination, Onshore field Total</string>
                    <string>Nominations, Aggregate Prevailing Nomination, Shrinkage (LDZ ) Total</string>
                    <string>Nominations, Aggregate Prevailing Nomination, Storage Entry Total</string>
                    <string>Nominations, Aggregate Prevailing Nomination, Storage Exit Total</string>
                    <string>Nominations, Aggregate Prevailing Nomination, Subterminals Total</string>
                    <string>Nominations, Aggregate Prevailing Nomination, Total Aggregate System Entry</string>
                    <string>Nominations, Aggregate Prevailing Nomination, Total Aggregate System Exit</string>
                    <string>Nominations, Prevailing Nomination, Aldbrough, Storage Entry</string>
                    <string>Nominations, Prevailing Nomination, Aldbrough, Storage Exit</string>
                    <string>Nominations, Prevailing Nomination, Avonmouth, Storage Entry</string>
                    <string>Nominations, Prevailing Nomination, Avonmouth, Storage Exit</string>
                    <string>Nominations, Prevailing Nomination, AvonmouthBL, Storage Boiloff</string>
                    <string>Nominations, Prevailing Nomination, Bacton, Interconnector Entry</string>
                    <string>Nominations, Prevailing Nomination, Bacton, Interconnector Exit</string>
                    <string>Nominations, Prevailing Nomination, Bacton-BBL, Interconnector Entry</string>
                    <string>Nominations, Prevailing Nomination, Bacton-BBL, Interconnector Exit</string>
                    <string>Nominations, Prevailing Nomination, Bacton-Perenco, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, Bacton-Seal, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, Bacton-Shell, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, Bacton-Tullow, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, Barrow, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, BurtonPoint, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, Dragon, LNG Importation</string>
                    <string>Nominations, Prevailing Nomination, DynevorArms, Storage Entry</string>
                    <string>Nominations, Prevailing Nomination, DynevorArms, Storage Exit</string>
                    <string>Nominations, Prevailing Nomination, DynevorArmsBN, Storage Boiloff</string>
                    <string>Nominations, Prevailing Nomination, Easington-Amethyst, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, Easington-Dimlington, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, Easington-Langeled, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, Easington-WestSole, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, Easington-York, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, Glenmavis, Storage Entry</string>
                    <string>Nominations, Prevailing Nomination, Glenmavis, Storage Exit</string>
                    <string>Nominations, Prevailing Nomination, GlenmavisBL, Storage Boiloff</string>
                    <string>Nominations, Prevailing Nomination, GrainNTS1, LNG Importation</string>
                    <string>Nominations, Prevailing Nomination, GrainNTS2, LNG Importation</string>
                    <string>Nominations, Prevailing Nomination, Hatfield Moor, Storage Exit</string>
                    <string>Nominations, Prevailing Nomination, Hatfield, OnshoreField</string>
                    <string>Nominations, Prevailing Nomination, HatfieldMoor, Storage Entry</string>
                    <string>Nominations, Prevailing Nomination, Hilltop, Storage Entry</string>
                    <string>Nominations, Prevailing Nomination, Hilltop, Storage Exit</string>
                    <string>Nominations, Prevailing Nomination, HoleHouseFarm, Storage Entry</string>
                    <string>Nominations, Prevailing Nomination, HoleHousFm, Storage Exit</string>
                    <string>Nominations, Prevailing Nomination, Holford, OnshoreField</string>
                    <string>Nominations, Prevailing Nomination, Holford, Storage Entry</string>
                    <string>Nominations, Prevailing Nomination, Holford, Storage Exit</string>
                    <string>Nominations, Prevailing Nomination, Hornsea, Storage Entry</string>
                    <string>Nominations, Prevailing Nomination, Hornsea, Storage Exit</string>
                    <string>Nominations, Prevailing Nomination, HumblyGrov, Storage Entry</string>
                    <string>Nominations, Prevailing Nomination, HumblyGrov, Storage Exit</string>
                    <string>Nominations, Prevailing Nomination, IsleOfGrain, Storage Entry</string>
                    <string>Nominations, Prevailing Nomination, IsleOfGrainBL, Storage Boiloff</string>
                    <string>Nominations, Prevailing Nomination, Moffat, Interconnector Entry</string>
                    <string>Nominations, Prevailing Nomination, Moffat, Interconnector Exit</string>
                    <string>Nominations, Prevailing Nomination, Partington, Storage Entry</string>
                    <string>Nominations, Prevailing Nomination, Partington, Storage Exit</string>
                    <string>Nominations, Prevailing Nomination, PartingtonBL, Storage Boiloff</string>
                    <string>Nominations, Prevailing Nomination, Rough, Storage Exit</string>
                    <string>Nominations, Prevailing Nomination, RoughStor, Storage Entry</string>
                    <string>Nominations, Prevailing Nomination, SouthHook, LNG Importation</string>
                    <string>Nominations, Prevailing Nomination, STFergus-Mobil, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, STFergus-Shell, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, STFergus-NSMP, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, Stublach, Storage Entry</string>
                    <string>Nominations, Prevailing Nomination, Stublach, Storage Exit</string>
                    <string>Nominations, Prevailing Nomination, Teesside-BP, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, Teesside-PX, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, Theddlethorpe, Sub Terminal</string>
                    <string>Nominations, Prevailing Nomination, Wytchfarm, OnshoreField</string>
                    </PublicationObjectNameList>
                </reqObject>
            </GetPublicationDataWM>
        </soap12:Body>
    </soap12:Envelope>""" % (toDate, fromDate,dateType)

    response = requests.post(url,data=body,headers=headers)
    return response.content
root = etree.fromstring(getXML())
# map prefix 'd' to the default namespace URI
ns = { 'd': 'http://www.NationalGrid.com/MIPI/'}
 

import pandas as pd
df1 = pd.DataFrame(columns=("applicable_at","applicable_for","name","value","quality_indicator","substituted","created_date"))   
for pd_date in pd.date_range(fromDate, periods=1):
    day = pd_date.strftime('%Y-%m-%d')
     
    root = etree.fromstring(getXML())
     
    #map prefix 'd' to the default namespace URI
    ns = {'d': 'http://www.NationalGrid.com/MIPI/'}
     
    publication_objects = root.xpath('//d:CLSMIPIPublicationObjectBE', namespaces=ns)
     
    for obj in publication_objects:
        name = obj.find('d:PublicationObjectName', ns).text
     
        for data in obj.findall('d:PublicationObjectData/d:CLSPublicationObjectDataBE', ns):
            applicable_at = pd.to_datetime(data.find('d:ApplicableAt', ns).text)    
            applicable_for = pd.to_datetime(data.find('d:ApplicableFor', ns).text)
            value = float(data.find('d:Value', ns).text)
            quality_indicator = data.find('d:Value', ns).text
            substituted = data.find('d:Substituted', ns).text
            created_date = pd.to_datetime(data.find('d:CreatedDate', ns).text)
        
         
            df1.loc[len(df1) +1] = [applicable_at, applicable_for,name, value, quality_indicator, substituted, created_date]
                      
            df1['name']= df1.name.str.replace('Nominations, Prevailing Nomination,', '')
            df1['name']= df1.name.str.replace('Nominations, Aggregate Prevailing Nomination,', '')
            df1['name']= df1.name.str.replace('Nominations,', '')
           # df1['new name'], df1['category'] = zip(*df1['name'].map(lambda x: x.split(',')))
          
Reply
#2
any help would be much appreciated :)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Assigning conditional values in Pandas Scott 3 726 Dec-19-2023, 03:10 AM
Last Post: Larz60+
  concat 3 columns of dataframe to one column flash77 2 778 Oct-03-2023, 09:29 PM
Last Post: flash77
  HTML Decoder pandas dataframe column mbrown009 3 962 Sep-29-2023, 05:56 PM
Last Post: deanhystad
  Supervised learning, tree based model - problems splitting data Pixel 0 644 May-16-2023, 05:25 PM
Last Post: Pixel
  attempt to split values from within a dataframe column mbrown009 8 2,224 Apr-10-2023, 02:06 AM
Last Post: mbrown009
  Use pandas to obtain cartesian product between a dataframe of int and equations? haihal 0 1,091 Jan-06-2023, 10:53 PM
Last Post: haihal
  Make unique id in vectorized way based on text data column with similarity scoring ill8 0 862 Dec-12-2022, 03:22 AM
Last Post: ill8
  Increase df column values decimals SriRajesh 2 1,086 Nov-14-2022, 05:20 PM
Last Post: deanhystad
  pandas column percentile nuncio 7 2,383 Aug-10-2022, 04:41 AM
Last Post: nuncio
  reduce time series based on sum condition amdi40 0 1,079 Apr-06-2022, 09:09 AM
Last Post: amdi40

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020