Python Forum
Delimited Values to ROW - Lucky Train ?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Delimited Values to ROW - Lucky Train ?
#1
All,

I have an XML file which contains a tag called Relation. This tag will have Custom SQL Query something like below.


<relation connection='oraRLe.1vt9plg0hto4k31akb7rk07dlyaz' name='Custom SQL Query' type='text'>Select i.RL_PROGRAM_ID ,i.RL_INVESTMENT_ID ,p.INVESTMENT_ID||&apos;: &apos;||p.INVESTMENT_NAME as &quot;Program&quot; ,p.INVESTMENT_ID as &quot;Program ID&quot; ,p.INVESTMENT_NAME as &quot;Program Name&quot; ,CASE WHEN p.IS_ACTIVE = 1 THEN &apos;Yes&apos; ELSE &apos;No&apos; END as &quot;Program Is Active?&quot; ,p.MANAGER_FULL_NAME as &quot;Program Manager&quot; ,p.TECH_OWNER_FULL_NAME as &quot;Program Tech Owner&quot; ,sRBS.RBS_NAME1 as &quot;Program Sponsor Lvl1&quot; ,sRBS.RBS_NAME2 as &quot;Program Sponsor Lvl2&quot; ,sRBS.RBS_NAME3 as &quot;Program Sponsor Lvl3&quot; ,p.MC_SPONSOR_RBS_FULL_PATH as &quot;Program Sponsor Path&quot; ,p.INVESTMENT_RBS_FULL_PATH as &quot;Program Investment RBS&quot; ,p.RD_DEMAND_REGION_DESC as &quot;Program Demand Region&quot; ,p.RD_FUND_SGMNT_DESC as &quot;Program Funding Segment&quot; ,p.ACCOUNTING_REGION_DESC as &quot;Program Accounting Region&quot; From RL_PROJ_PROG_XREF i LEFT JOIN RL_PROJECT_DIM p ON i.RL_PROGRAM_ID = p.RL_INVESTMENT_ID LEFT JOIN RL_RBS sRBS ON p.MC_SPONSOR_RBS_UNIT_ID = sRBS.RBS_UNIT_ID Where (p.RELEASE_INDICATOR is null or p.RELEASE_INDICATOR = &apos;RD_notinarelease&apos;) UNION Select i.RL_PROGRAM_ID ,i.RL_PROGRAM_ID as RL_INVESTMENT_ID ,p.INVESTMENT_ID||&apos;: &apos;||p.INVESTMENT_NAME as Program ,p.INVESTMENT_ID as &quot;Program ID&quot; ,p.INVESTMENT_NAME as &quot;Program Name&quot; ,CASE WHEN p.IS_ACTIVE = 1 THEN &apos;Yes&apos; ELSE &apos;No&apos; END as &quot;Program Is Active?&quot; ,p.MANAGER_FULL_NAME as &quot;Program Manager&quot; ,p.TECH_OWNER_FULL_NAME as &quot;Program Tech Owner&quot; ,sRBS.RBS_NAME1 as &quot;Program Sponsor Lvl1&quot; ,sRBS.RBS_NAME2 as &quot;Program Sponsor Lvl2&quot; ,sRBS.RBS_NAME3 as &quot;Program Sponsor Lvl3&quot; ,p.MC_SPONSOR_RBS_FULL_PATH as &quot;Program Sponsor Path&quot; ,p.INVESTMENT_RBS_FULL_PATH as &quot;Program Investment RBS&quot; ,p.RD_DEMAND_REGION_DESC as &quot;Program Demand Region&quot; ,p.RD_FUND_SGMNT_DESC as &quot;Program Funding Segment&quot; ,p.ACCOUNTING_REGION_DESC as &quot;Program Accounting Region&quot; From RL_PROJ_PROG_XREF i LEFT JOIN RL_PROJECT_DIM p ON i.RL_PROGRAM_ID = p.RL_INVESTMENT_ID LEFT JOIN RL_RBS sRBS ON p.MC_SPONSOR_RBS_UNIT_ID = sRBS.RBS_UNIT_ID Where (p.RELEASE_INDICATOR is null or p.RELEASE_INDICATOR = &apos;RD_notinarelease&apos;)</relation>


My requirement is to get TABLE and its relevant COLUMN NAMES used in he query. I have 120 XML file like this.

Expected Output:

FileName TableName ColumnName
F1 T1 C1
F1 T1 C2
F1 T2 C1

F2 T1 C1
F2 T2 C1

There are 2 cases:

#1 - If alias found in the column name then we have to find the same alias name after the table name and match it with that particular table.
#2 - If alias not found, then we have to map each and very column to every tables used after the FROM clause ( It is like a cartesian product)
I will take the actual table name & columns name from ALL_TAB_COLUMNS( Oracle) and match with the output point#2 to eliminate the unwanted combinations.

Inputs are welcome!

This is my 2nd post and 2nd task in Python...

Let me tell you what are all the options i explored so far.. I dont know is this the right approach to get the desired result.

Note the below result is somehow ok, but i tried to replace JUNK character, with the help of REPLACE(), But no luck.

from pandas import DataFrame 

a = DataFrame([{'var1': ' c.Type

    ,c.RL_INVESTMENT_ID
--    ,c.RL_RESOURCE_ID
', 'var2': 1},
               {'var1': 'd,e,f', 'var2': 2}])
a= a.replace("
","")
a= a.replace("&quot;",'"')
             
def tidy_split(df, column, sep='|', keep=False):
    """
    Split the values of a column and expand so the new DataFrame has one split
    value per row. Filters rows where the column is missing.

    Params
    ------
    df : pandas.DataFrame
        dataframe with the column to split and expand
    column : str
        the column to split and expand
    sep : str
        the string used to split the column's values
    keep : bool
        whether to retain the presplit value as it's own row

    Returns
    -------
    pandas.DataFrame
        Returns a dataframe with the same columns as `df`.
    """
    indexes = list()
    new_values = list()
 
    df = df.dropna(subset=[column])
    print ("df as ", df)
    for i, presplit in enumerate(df[column].astype(str)):
        values = presplit.split(sep)
        print ("values as ", values)
        print ("i as" , i)
        if keep and len(values) > 1:
            indexes.append(i)
            new_values.append(presplit)
        for value in values:
            indexes.append(i)
            new_values.append(value)
    new_df = df.iloc[indexes, :].copy()
    new_df[column] = new_values
    print( new_df[column])

def main():
    
    print (a)
    tidy_split(a, 'var1', sep=',')

if __name__ == '__main__':
    main()
Reply
#2
any thoughts pls?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Tab Delimited Strings? johnywhy 7 487 Jan-13-2024, 10:34 PM
Last Post: sgrey
  How to fix With n_samples=0, test_size=0.2 and train_size=None, the resulting train s MrSonoa 2 2,690 Apr-15-2023, 12:02 PM
Last Post: MrSonoa
  dataframe write to tab delimited differs from Excel koh 0 1,959 Aug-01-2021, 02:46 AM
Last Post: koh
  How do I split a dataset into test/train/validation according to a particular group? 69195Student 1 2,241 May-12-2021, 08:27 PM
Last Post: bowlofred
  Appending data into a file in tab delimited format metro17 1 4,072 Aug-06-2019, 07:34 AM
Last Post: fishhook
  Creating a delimited file from DB Table anubhav2020 9 6,940 Sep-19-2018, 05:22 PM
Last Post: Axel_Erfurt
  Load Comma Delimited csv to Nested Dictionary Huck 2 7,904 Apr-30-2018, 04:21 PM
Last Post: Huck

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020