Python Forum

Full Version: Delimited Values to ROW - Lucky Train ?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
All,

I have an XML file which contains a tag called Relation. This tag will have Custom SQL Query something like below.


<relation connection='oraRLe.1vt9plg0hto4k31akb7rk07dlyaz' name='Custom SQL Query' type='text'>Select i.RL_PROGRAM_ID ,i.RL_INVESTMENT_ID ,p.INVESTMENT_ID||&apos;: &apos;||p.INVESTMENT_NAME as &quot;Program&quot; ,p.INVESTMENT_ID as &quot;Program ID&quot; ,p.INVESTMENT_NAME as &quot;Program Name&quot; ,CASE WHEN p.IS_ACTIVE = 1 THEN &apos;Yes&apos; ELSE &apos;No&apos; END as &quot;Program Is Active?&quot; ,p.MANAGER_FULL_NAME as &quot;Program Manager&quot; ,p.TECH_OWNER_FULL_NAME as &quot;Program Tech Owner&quot; ,sRBS.RBS_NAME1 as &quot;Program Sponsor Lvl1&quot; ,sRBS.RBS_NAME2 as &quot;Program Sponsor Lvl2&quot; ,sRBS.RBS_NAME3 as &quot;Program Sponsor Lvl3&quot; ,p.MC_SPONSOR_RBS_FULL_PATH as &quot;Program Sponsor Path&quot; ,p.INVESTMENT_RBS_FULL_PATH as &quot;Program Investment RBS&quot; ,p.RD_DEMAND_REGION_DESC as &quot;Program Demand Region&quot; ,p.RD_FUND_SGMNT_DESC as &quot;Program Funding Segment&quot; ,p.ACCOUNTING_REGION_DESC as &quot;Program Accounting Region&quot; From RL_PROJ_PROG_XREF i LEFT JOIN RL_PROJECT_DIM p ON i.RL_PROGRAM_ID = p.RL_INVESTMENT_ID LEFT JOIN RL_RBS sRBS ON p.MC_SPONSOR_RBS_UNIT_ID = sRBS.RBS_UNIT_ID Where (p.RELEASE_INDICATOR is null or p.RELEASE_INDICATOR = &apos;RD_notinarelease&apos;) UNION Select i.RL_PROGRAM_ID ,i.RL_PROGRAM_ID as RL_INVESTMENT_ID ,p.INVESTMENT_ID||&apos;: &apos;||p.INVESTMENT_NAME as Program ,p.INVESTMENT_ID as &quot;Program ID&quot; ,p.INVESTMENT_NAME as &quot;Program Name&quot; ,CASE WHEN p.IS_ACTIVE = 1 THEN &apos;Yes&apos; ELSE &apos;No&apos; END as &quot;Program Is Active?&quot; ,p.MANAGER_FULL_NAME as &quot;Program Manager&quot; ,p.TECH_OWNER_FULL_NAME as &quot;Program Tech Owner&quot; ,sRBS.RBS_NAME1 as &quot;Program Sponsor Lvl1&quot; ,sRBS.RBS_NAME2 as &quot;Program Sponsor Lvl2&quot; ,sRBS.RBS_NAME3 as &quot;Program Sponsor Lvl3&quot; ,p.MC_SPONSOR_RBS_FULL_PATH as &quot;Program Sponsor Path&quot; ,p.INVESTMENT_RBS_FULL_PATH as &quot;Program Investment RBS&quot; ,p.RD_DEMAND_REGION_DESC as &quot;Program Demand Region&quot; ,p.RD_FUND_SGMNT_DESC as &quot;Program Funding Segment&quot; ,p.ACCOUNTING_REGION_DESC as &quot;Program Accounting Region&quot; From RL_PROJ_PROG_XREF i LEFT JOIN RL_PROJECT_DIM p ON i.RL_PROGRAM_ID = p.RL_INVESTMENT_ID LEFT JOIN RL_RBS sRBS ON p.MC_SPONSOR_RBS_UNIT_ID = sRBS.RBS_UNIT_ID Where (p.RELEASE_INDICATOR is null or p.RELEASE_INDICATOR = &apos;RD_notinarelease&apos;)</relation>


My requirement is to get TABLE and its relevant COLUMN NAMES used in he query. I have 120 XML file like this.

Expected Output:

FileName TableName ColumnName
F1 T1 C1
F1 T1 C2
F1 T2 C1

F2 T1 C1
F2 T2 C1

There are 2 cases:

#1 - If alias found in the column name then we have to find the same alias name after the table name and match it with that particular table.
#2 - If alias not found, then we have to map each and very column to every tables used after the FROM clause ( It is like a cartesian product)
I will take the actual table name & columns name from ALL_TAB_COLUMNS( Oracle) and match with the output point#2 to eliminate the unwanted combinations.

Inputs are welcome!

This is my 2nd post and 2nd task in Python...

Let me tell you what are all the options i explored so far.. I dont know is this the right approach to get the desired result.

Note the below result is somehow ok, but i tried to replace JUNK character, with the help of REPLACE(), But no luck.

from pandas import DataFrame 

a = DataFrame([{'var1': ' c.Type

    ,c.RL_INVESTMENT_ID
--    ,c.RL_RESOURCE_ID
', 'var2': 1},
               {'var1': 'd,e,f', 'var2': 2}])
a= a.replace("
","")
a= a.replace("&quot;",'"')
             
def tidy_split(df, column, sep='|', keep=False):
    """
    Split the values of a column and expand so the new DataFrame has one split
    value per row. Filters rows where the column is missing.

    Params
    ------
    df : pandas.DataFrame
        dataframe with the column to split and expand
    column : str
        the column to split and expand
    sep : str
        the string used to split the column's values
    keep : bool
        whether to retain the presplit value as it's own row

    Returns
    -------
    pandas.DataFrame
        Returns a dataframe with the same columns as `df`.
    """
    indexes = list()
    new_values = list()
 
    df = df.dropna(subset=[column])
    print ("df as ", df)
    for i, presplit in enumerate(df[column].astype(str)):
        values = presplit.split(sep)
        print ("values as ", values)
        print ("i as" , i)
        if keep and len(values) > 1:
            indexes.append(i)
            new_values.append(presplit)
        for value in values:
            indexes.append(i)
            new_values.append(value)
    new_df = df.iloc[indexes, :].copy()
    new_df[column] = new_values
    print( new_df[column])

def main():
    
    print (a)
    tidy_split(a, 'var1', sep=',')

if __name__ == '__main__':
    main()
any thoughts pls?