A Beginner’s Guide to Machine Learning

Who is this guide for?

For techies who want to brush up on the basics of machine learning.
For those who do not understand technology, but want to get acquainted with machine learning and do not know where to start.
For those who think that machine learning is “hard” to master.

Why machine learning?

Artificial intelligence affects our future more than any other innovation. The pace at which AI is advancing is astounding: the rapidly growing amount and variety of data available, cheaper and more powerful computational processing, and affordable data stores.

In this article, you’ll learn the basics of machine learning and the algorithms that underpin the technologies that affect our daily lives. You’ll learn how they work and what tools to use to build similar models and applications.

Preparing to Learn Machine Learning

To understand the presented concepts, you need to have the following knowledge:

Advanced knowledge of elementary level algebra: You should understand concepts such as variables, coefficients, linear equations, calculus, and graphs.
Basic programming knowledge and experience writing Python code: No experience in machine learning is required, but you should be able to read and write Python code with basic constructs such as function definitions, lists, dictionaries, loops, and conditionals.
Basic knowledge of the following Python libraries:

  • numpy
  • pandas
  • SciKit-Learn
  • scipy
  • Matplotlib (and/or Seaborn)

Semantic tree:
Artificial intelligence is the science of agents that perceive the world around them, form plans and make decisions to achieve goals.

Machine learning is a subset of artificial intelligence. Its goal is to teach computers to learn on their own. With the help of a learning algorithm, a machine can identify patterns in specified data, perform model building, and predict things without explicitly programmed rules and models.

What is machine learning?

Arthur Samuel describes machine learning as: “The field of science through which computers can learn without being explicitly programmed.” This is an old and informal definition that has almost lost its meaning at the moment.

Tom Mitchell gives a more modern definition: “A computer program learns from experience E with respect to some class of problems T and a measure of quality P if the quality of solving problems in T, as measured by P, improves with experience E.”

In simple terms, the point of machine learning is that underlying algorithms can provide information about a set of data without having to write code to solve the problem manually. Instead of writing code, you provide data to the underlying algorithm and it generates its own conclusions based on that data.

Machine learning algorithms fall into the following categories: Supervised Learning, Unsupervised Learning, and Reinforcement Learning.

Supervised Learning: A supervised learning algorithm takes labeled data and creates a model that makes predictions by providing new data. These can be both classification and regression tasks.
Unsupervised Learning: Unsupervised learning provides unlabeled and unclassified data in which to look for patterns and create a data structure to get a value. Forms of unsupervised learning: clustering and dimensionality reduction.
Reinforcement Learning: Reinforcement learning uses a reward system and trial and error to maximize long-term rewards.

Machine learning workflow

Roadmap to start learning machine learning:
It’s worth starting with the study / repetition of linear algebra. MIT provides an excellent linear algebra open course that introduces key concepts. Particular attention should be paid to the study of vectors, matrix multiplication, determinants and spectral decomposition of a matrix — they play an important role in the operation of machine learning algorithms.
Then pay attention to higher mathematics. Learn derivatives and how to use them for optimization. Be sure to cover all the topics in the Single Variable Calculus and (at least) the first two sections of the Multivariable Calculus.
Explore the Python libraries used in machine learning such as Numpy, Pandas, Matplotlib, and SKLearn. Understanding machine learning without these ‘tools’ will be quite difficult.
Start programming! I advise you to implement all the algorithms from scratch in Python before using the ready-made models in SciKit - Learn to figure out how it all works. I did the algorithms in the following order in increasing complexity:

  • Linear Regression
  • Logistic Regression
  • Naive Bayes Classifier
  • k-Nearest Neighbors Method (K — Nearest Neighbors — KNN)
  • k-means method (K — Means)
  • Support Vector Machine (SVM)
  • Decision Trees
  • Random forests
  • Gradient Boosting

Roadmap for the implementation of the algorithm:
Collect data for work. There are millions of datasets available on the Internet to suit even the most bizarre needs. Kaggle and UCI are great resources for browsing datasets. You can also generate your own data.
Select an algorithm(s). After collecting data, you can start working on algorithms. The image shows a rough guideline. (From the SKLearn documentation)

At this stage, you should go through a brief theory of each algorithm, which I posted on Github with each implementation.

3. Visualize the data! There are many libraries in Python, such as Matplotlib and Seaborn, with which you can plot data and get the final result. In this way, it will be easier for you to understand the data and what actions it performs. (and of course, make a cool model!)

4. Set up the algorithm. All implemented models have many buttons and levers, known as hyperparameters. The learning rate factor, the value of k, etc.— all of these can be changed to get the best possible model.

5. Evaluate the model. The Python SKLearn library provides many tools to analyze the model and check metrics such as correctness, f1 score, accuracy, etc.

Notes:
After becoming familiar with several algorithms and concepts, try one or more simple and short-term projects (to begin with).
Don’t be afraid to make mistakes. In the beginning, you will spend most of your time trying to figure out the math and why errors occur. However, patience is the key to success.
Small models are the basis for learning something bigger. Try everything and then you will be able to create the best model.
The best way to learn Python libraries is to take Datacamp courses or start by reading the documentation.
All of the above can be found on Github in the Machine-Learning repository. All algorithms are systematized, both with implementations from scratch and using SciKit-Learn. The data sets used and a brief theory about how the algorithms work are also presented along with real life examples.