Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Data Science Project
#1
Hi

I am newly learning data science and am wondering if the below will qualify for a project that can be implemented in Python using ML algorithms:

I have master data set that I will have to extract from a pdf. It will have 2 fields e.g. Area Code and Area as below:

AreaCode Area
3100 Gate
3110 Sumps
3230 Fireworks
4222 Air Purifier
4335 Water Filter

I have a second dataset which is created after searching a pdf and extracting data having one field Object Name e.g.
ObjectName
A1-G-3100012
A1-K-3100010
A1-K-3230010
A1-P-3230015
A1-P-4222015
A1-G-4235016
A1-G-4335012
A1-K-3110010
A1-K-3230010
A1-P-3230025
A1-P-4335075
A1-G-4235086
A1-M-3100012
A1-H-3100010
A1-H-3230010
A1-V-3230015
A1-V-4222015
A1-M-4235016
A1-M-4335012
A1-H-3110010
A1-H-3230010
A1-V-3230025
A1-V-4335075
A1-M-4235086

I want to create a model that will learn first dataset and populate AreaCode in second dataset.

Does this make sense for an application of datascience?

Sorry about my ignorance but requesting some inputs.

Regards
Quote
#2
I don't see that as data science - you are just going to create an algorithm that compares a slice of the strings in the second data set with the first set, which is a lookup table.

Now if you had a purchase history and wanted to predict most likely subsequent purchases based on the first, that's data science (a recommendation system). i.e. if someone buys a sump should you as vendor send them a spam email advertising a water filter or fireworks?
Quote
#3
Thank you for the response Her.
Wouldn't this even qualify for a classification?

Regards
Quote
#4
Quoting from Data Science for Dummies (please do not take offence, that is not intended) "With classification algorithms, you take an existing dataset and use what you know about it to generate a predictive model for use in classification of future data points. If your goal is to use your dataset and its known subsets to build a model for predicting the categorization of future data points, you’ll want to use classification algorithms."

You don't need a model, you can say with 100% certainty what the class is as it is included in the pbject name. So, I would not, but that's just my opinion.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Softwares to learn data science jk91 2 102 Feb-26-2020, 07:17 PM
Last Post: jefsummers
  Nvidia or (25% better for the price) Radeon GPU for Python Data Science gheek 0 122 Dec-11-2019, 05:19 PM
Last Post: gheek
  data science peepeepoopoo 1 303 Sep-21-2019, 10:34 PM
Last Post: Larz60+
  Python for Enterprise Data Science paripy 3 574 May-03-2019, 05:37 AM
Last Post: directnirvana

Forum Jump:


Users browsing this thread: 1 Guest(s)