Parse using reg_ex - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Parse using reg_ex (/thread-19831.html) |
Parse using reg_ex - UrsVinny - Jul-16-2019 Hi All I am a newbie to python world and came across a situation where i need to pick value from a string based on certain patterns. The dataset is having two columns Please find the desired output below. Below is my code.import re import pyspark.sql.functions as F import pyspark.sql.types as T from datasource.enrich.derivation import derives @derives("affected_seats") def parse_affected_seats(defect_description): seats_pattern = re.compile( r'\b([1-9][0-9]?[A-K])\b' ) def parse_seats(text): return sorted(list(set(seats_pattern.findall(text)))) if text else None parse_seats_udf = F.udf(parse_seats, T.ArrayType(T.StringType())) return parse_seats_udf(defect_description)]Any kind of help is highly appreciated. Regards Vinny RE: Parse using reg_ex - ichabod801 - Jul-16-2019 First, it is unclear to me what the program is doing. I don't understand how the correct output is determined. What is the goal here. Second, what is the problem you are having? Are you getting an error? What is it? Is the output wrong? How is it wrong? |