Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Data Science Project
#1
Hi

I am newly learning data science and am wondering if the below will qualify for a project that can be implemented in Python using ML algorithms:

I have master data set that I will have to extract from a pdf. It will have 2 fields e.g. Area Code and Area as below:

AreaCode Area
3100 Gate
3110 Sumps
3230 Fireworks
4222 Air Purifier
4335 Water Filter

I have a second dataset which is created after searching a pdf and extracting data having one field Object Name e.g.
ObjectName
A1-G-3100012
A1-K-3100010
A1-K-3230010
A1-P-3230015
A1-P-4222015
A1-G-4235016
A1-G-4335012
A1-K-3110010
A1-K-3230010
A1-P-3230025
A1-P-4335075
A1-G-4235086
A1-M-3100012
A1-H-3100010
A1-H-3230010
A1-V-3230015
A1-V-4222015
A1-M-4235016
A1-M-4335012
A1-H-3110010
A1-H-3230010
A1-V-3230025
A1-V-4335075
A1-M-4235086

I want to create a model that will learn first dataset and populate AreaCode in second dataset.

Does this make sense for an application of datascience?

Sorry about my ignorance but requesting some inputs.

Regards
Reply
#2
I don't see that as data science - you are just going to create an algorithm that compares a slice of the strings in the second data set with the first set, which is a lookup table.

Now if you had a purchase history and wanted to predict most likely subsequent purchases based on the first, that's data science (a recommendation system). i.e. if someone buys a sump should you as vendor send them a spam email advertising a water filter or fireworks?
Reply
#3
Thank you for the response Her.
Wouldn't this even qualify for a classification?

Regards
Reply
#4
Quoting from Data Science for Dummies (please do not take offence, that is not intended) "With classification algorithms, you take an existing dataset and use what you know about it to generate a predictive model for use in classification of future data points. If your goal is to use your dataset and its known subsets to build a model for predicting the categorization of future data points, you’ll want to use classification algorithms."

You don't need a model, you can say with 100% certainty what the class is as it is included in the pbject name. So, I would not, but that's just my opinion.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Data Science Oshadha 2 1,229 Jun-30-2022, 04:59 PM
Last Post: Larz60+
  Data Science - "key of type tuple not found and not a MultiIndex" priyanshuaggarwal 0 5,117 Nov-07-2021, 11:22 PM
Last Post: priyanshuaggarwal
  Networkx / Data Science IamAlbert 0 1,609 Sep-11-2020, 05:33 PM
Last Post: IamAlbert
  What good book in Data science ? bashar 0 1,773 Apr-14-2020, 03:29 AM
Last Post: bashar
  WGET + Data Science + Python Programs BrandonKastning 0 1,608 Mar-29-2020, 06:43 PM
Last Post: BrandonKastning
  Data science with Python - links with exercises darpInd 1 1,965 Mar-02-2020, 04:24 PM
Last Post: Larz60+
  Softwares to learn data science jk91 2 2,297 Feb-26-2020, 07:17 PM
Last Post: jefsummers
  Nvidia or (25% better for the price) Radeon GPU for Python Data Science gheek 0 1,653 Dec-11-2019, 05:19 PM
Last Post: gheek
  data science peepeepoopoo 1 2,267 Sep-21-2019, 10:34 PM
Last Post: Larz60+
  Python for Enterprise Data Science paripy 3 2,747 May-03-2019, 05:37 AM
Last Post: directnirvana

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020