Getting started with Machine Learning
In this part I will talk about some basic concepts of machine learning that you will be going to read or hear whenever anybody talks about ML. This will also help you in reading the future blogs of this series.
So let’s start with a new series of blogs known as “getting started with Machine Learning basics”.
Being a technology enthusiast I was always amazed to read about Machine Learning and used to think that how this actually works, I also tried to learn it through several blogs and video tutorials.
Right now I am in the process of learning more and more about ML, so I just want to share what I had learned so far in a simplest way possible and I will keep on updating whatever I will learn and practice about machine learning.
What is Machine Learning ?
Machine Learning focuses on the development of computer programs that can access data and use it learn for themselves. More formally it can be defined as:
The field of study that gives computers the ability to learn without being explicitly programmed.
– by Arthur Samuel
Classification of Machine Learning problems
Broadly we can classify any machine learning problem into two classes-
- Supervised Learning
- Unsupervised Learning
It is a class of Machine Learning. In this –
- We have a dataset
- Dataset is called a training set
We are given a data set and already know what our correct output should look like, having an idea that there is a relationship between the input and the output.
Supervised learning problems can be categorized further into “regression” and “classification” models.
A regression model predicts continuous values. For example, regression models make predictions that answer questions like the following:
- What is the value of a house in California?
- What is the probability that a user will click on this ad?
The answers to above questions will not be a fixed value rather they would lie in a range of values.
We are given -X -> Size of houseY -> Price of houseSo now if someone ask us to find price for a 750 sqft. house, then we can easily determine based on the past data and give the output that it must be somewhat around 200k $.
Regression problem algorithms-
- Linear Regression
- Regression Trees(e.g. Random Forest)
- Support Vector Regression (SVR), etc.
A classification model predicts discrete values. For example, classification models make predictions that answer questions like the following:
- Is a given email message spam or not spam?
- Is this an image of a dog, a cat, or a hamster?
In this any given item is divided into categories ,based upon the past data. Its like solving a Yes/No problem.
Classification problem algorithms –
- Decision Trees
- Logistic Regression
- Naive BayesK Nearest Neighbors
- Linear SVC (Support vector Classifier), etc
Unsupervised learning allows us to approach problems with little or no idea what our results should look like. In this we need to find the patterns in data. The data given is not labelled, which means with input value (X) no corresponding output (Y) is given. The algorithms the left themselves to discover some interesting structure in the data.
We can derive this structure by clustering the data based on relationships among the variables in the data.With unsupervised learning there is no feedback based on the prediction results.
Unsupervised learning problems are categorized into “Clustering” and “Non-clustering” problems.
Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.
The “Cocktail Party Algorithm”, allows you to find structure/sound in a chaotic environment. (i.e. identifying individual voices and music from a mesh of sounds at a cocktail party).
Now let’s explore some fundamental machine learning terminology –
A label is the thing we’re predicting—the y variable in simple linear regression. The label could be the future price of wheat, the kind of animal shown in a picture, the meaning of an audio clip, or just about anything, like – spam or not spam is the label in spam detector example
A feature is an input variable—the x variable in simple linear regression. A simple machine learning project might use a single feature, while a more sophisticated machine learning project could use millions of features, specified as: In the spam detector example, the features could include the following:
- Words in the email text
- Sender’s address
- Time of day the email was sent
- Email contains the phrase “lottery”
Based on these features we can easily determine that whether an incoming mail is spam or not. Now the examples can be further categorized into two categories:
- Labelled – Labelled example includes both feature(s) and label. Model is mainly trained by using labelled data.
- Unlabeled – In this labels are not present along with the features but our model can easily determine exact label as we used lot of labelled data to train the model.
Once we train our model with labeled examples, we can use that model to predict the label on unlabeled examples, for instance – In the spam detector, unlabeled examples are new incoming emails.
A model is something that we learn from data. A model defines the relationship between features and label, for example- a spam detection model might associate certain features strongly with “spam”.
Let’s highlight two phases of a model’s life:
- Training : It means creating or learning the model which is you show the model labeled examples and enable the model to gradually learn the relationships between features and label.
- Inference : It means applying the trained model to unlabeled examples which is, you use the trained model to make useful predictions (y’).
We need to train our ML model with some labelled examples so that later on our model can successfully make useful predictions on the real world unlabeled data.
That’s all for the first part.Checkout Machine learning with python books at Amazon
Thank you for reading. Let me know your feedback.
Happy Learning 🙂