Python Forum
How can I use python for data transformation instead of PowerQuery?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How can I use python for data transformation instead of PowerQuery?
#1
Hello Guys

I would like to ask a general theoretical question. We have right now a BI solution which uses excel files from sharepoint and the data is transformed in powerquery according to the needs of the business groups. In the near future we will have Microsoft Azure implementation, so the data will be available from there. I want to create a BI solution where you access the data on Azure and you create datasets for the various business groups using python data transformations, and after this you analize it in PowerBI. I think the main advantages of this would be:

Better performance: python goes through data way faster than PowerQuery.
Reusability: the created dataset could be reused whenever it is required.
What do you guys think is it a good direction to develop? Also I am having a couple of open questions:

Should I run the scripts inside PowerBI, or outside of it and ingest only the result into PowerBI?
What packages of python should I revise apart from the pandas and numpy, matplotlib ?
Should I use classes with parametres? Or simply create a script for each business group?

Thank you in advance

Mark
Reply
#2
Python is a powerful tool for data transformation and can be used as an alternative to PowerQuery. Here are a few ways to use Python for data transformation:

Pandas: One of the most popular Python libraries for data manipulation is Pandas. It provides a DataFrame structure, similar to a table in a relational database, which allows you to perform various data transformation tasks such as filtering, sorting, and aggregating data.

NumPy: NumPy is another popular Python library that can be used for data transformation. It provides a powerful array structure for working with numerical data and can be used to perform mathematical operations on large datasets.

Data Wrangling: Python libraries such as Pandas and NumPy can be used to perform data wrangling tasks such as cleaning, transforming, and reshaping data.

Data Visualization: Python libraries such as Matplotlib and Seaborn can be used to create various types of data visualization, such as line charts, bar charts, and scatter plots, which can help to identify patterns and trends in the data.

Scripting: Python can be used to write scripts that automate the data transformation process. This can be useful for tasks such as data cleaning and data validation.

ETL: Python libraries such as pandas, numpy, and PySpark can be used in Extract, Transform and Load (ETL) processes, pulling data from various sources, cleaning it, and loading it into a data warehouse or other data storage solutions.

Overall, Python provides a wide range of libraries, modules, and functionalities that can be used to perform various data transformation tasks. It also provides a powerful and flexible programming language that can be used to create custom scripts and solutions for specific data transformation needs.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Library for wavelet transformation erdemath 2 1,843 Jul-07-2021, 06:41 PM
Last Post: erdemath

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020