Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Parse using reg_ex
Hi All
I am a newbie to python world and came across a situation where i need to pick value from a string based on certain patterns.
The dataset is having two columns
Defect_id|event_description D001|[DEFECT[: 40 D SCREEN INOP D002|SEATS 04DE / 06DE / 05DE NO IFE / SEAT INOP (RECLINE) D003|IFE INOP @ FOLLOWING SEATS IN SPITE OF RESET DONE IN FLIGHT : 10GH , 12EF , 16F , .., 34B , 33C , 32C , 30D , 27B , 18A , 17D , 16D , 14A , 12A.
Please find the desired output below.
Defect_id|affected_seats D001|40D D002|04D,04E,05D,05E,06D,06E D003|10G,10H,12A,12E,12F,14A,16D,16F,17D,18A,27B,30D,32C,33C,34B
Below is my code.

import re
import pyspark.sql.functions as F
import pyspark.sql.types as T

from datasource.enrich.derivation import derives

def parse_affected_seats(defect_description):
    seats_pattern = re.compile(

    def parse_seats(text):
        return sorted(list(set(seats_pattern.findall(text)))) if text else None

    parse_seats_udf = F.udf(parse_seats, T.ArrayType(T.StringType()))
    return parse_seats_udf(defect_description)]
Any kind of help is highly appreciated.

Larz60+ wrote Jul-16-2019, 02:43 PM:
Please post all code, output and errors (in it's entirety) between their respective tags. I did it for you this time, Here are instructions on how to do it yourself next time.
First, it is unclear to me what the program is doing. I don't understand how the correct output is determined. What is the goal here. Second, what is the problem you are having? Are you getting an error? What is it? Is the output wrong? How is it wrong?
Craig "Ichabod" O'Brien -
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures


Top Page

Forum Jump:

Users browsing this thread: 1 Guest(s)