Data Science Project - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Data Science Project (/thread-23793.html) |
Data Science Project - DaisyPJ - Jan-17-2020 Hi I am newly learning data science and am wondering if the below will qualify for a project that can be implemented in Python using ML algorithms: I have master data set that I will have to extract from a pdf. It will have 2 fields e.g. Area Code and Area as below: AreaCode Area 3100 Gate 3110 Sumps 3230 Fireworks 4222 Air Purifier 4335 Water Filter I have a second dataset which is created after searching a pdf and extracting data having one field Object Name e.g. ObjectName A1-G-3100012 A1-K-3100010 A1-K-3230010 A1-P-3230015 A1-P-4222015 A1-G-4235016 A1-G-4335012 A1-K-3110010 A1-K-3230010 A1-P-3230025 A1-P-4335075 A1-G-4235086 A1-M-3100012 A1-H-3100010 A1-H-3230010 A1-V-3230015 A1-V-4222015 A1-M-4235016 A1-M-4335012 A1-H-3110010 A1-H-3230010 A1-V-3230025 A1-V-4335075 A1-M-4235086 I want to create a model that will learn first dataset and populate AreaCode in second dataset. Does this make sense for an application of datascience? Sorry about my ignorance but requesting some inputs. Regards RE: Data Science Project - jefsummers - Jan-17-2020 I don't see that as data science - you are just going to create an algorithm that compares a slice of the strings in the second data set with the first set, which is a lookup table. Now if you had a purchase history and wanted to predict most likely subsequent purchases based on the first, that's data science (a recommendation system). i.e. if someone buys a sump should you as vendor send them a spam email advertising a water filter or fireworks? RE: Data Science Project - DaisyPJ - Jan-19-2020 Thank you for the response Her. Wouldn't this even qualify for a classification? Regards RE: Data Science Project - jefsummers - Jan-19-2020 Quoting from Data Science for Dummies (please do not take offence, that is not intended) "With classification algorithms, you take an existing dataset and use what you know about it to generate a predictive model for use in classification of future data points. If your goal is to use your dataset and its known subsets to build a model for predicting the categorization of future data points, you’ll want to use classification algorithms." You don't need a model, you can say with 100% certainty what the class is as it is included in the pbject name. So, I would not, but that's just my opinion. |