How to get started? From which models are better to start for the beginner ...

This article will not necessarily work for everyone, but it definitely will try to erase unnecessary complexity, give some valuable resources to start with and hopefully will boost desire for further exploration. Important point to mention is that this article is not trying to parasite on other information sources but try to show wide range of available information from different resources and for different learning way preferences.

Introduction

Today, machine learning and deep learning solutions are highly integrated in almost all aspects of people’s lives. Most of those solutions were created to deal with novel, highly specified tasks, while the others, can deal with wide range of problems.

Due to this, at some point most of the people have a desire to get familiar with those mechanisms work and understand what is going on under the hood and what all of those algorithms are about.

This may lead to scenario in which person may face an article or scientific research, due to which complexity, the desire to discover further will be erased completely, although it does not mean that in most scenarios most of the potential data scientist are lost due to their bad lack or unclear introductory peace of information.

Text materials for beginners

The best start recommendation would be to search for top best machine learning algorithms in running year. Information about each concept is always up to the point, has general and clear description. It allows to understand which topics are truly wake up your interest and worth to read about more.

Here is an example of such an article –

https://towardsdatascience.com/top-10-algorithms-for-machine-learning-beginners-149374935f3c .

The grate point about this article is that it provides real-life applications for each algorithm, for beginner it might be interesting and helpful to analyse those examples. For instance, trying to understand the idea using only personal familiarity with such an application and brief description of algorithms lifecycle.

Video materials for beginners

For those who prefer video materials, here is an introductory, theoretical set of videos form MIT, by Alexander Amini –

https://www.youtube.com/user/Zan560/videos .

It covers a wide range of neural network related topics, it also might help to understand what is the neural network itself and discover multiple, completely different use cases for them.

K-means clustering

The following part will go over another top 5 machine learning and deep learning techniques which might become encouraging starting point for some of the readers.

The first concept to be described is K-Means clustering. But before diving into, the concept of clustering will be described briefly. Clustering is a Machine Learning technique that involves the grouping of data points.

Usually difference of features, which describe those points(entities) to be grouped, are used to identify objects similarities consequently helping in task of clusters creation.

K-Means is one of the most popular models, used for clustering task. The day-to-day application examples might be fake news detection, spam identification, filtering and customer grouping.

To obtain additional, generalized information about clustering algorithms and their types, it is recommended to check this article –

https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68

and good additional, more specific article focuses on DBSCAN, one of the widely used clustering model –

https://towardsdatascience.com/dbscan-algorithm-complete-guide-and-application-with-python-scikit-learn-d690cbae4c5d .

KNN classification

Second approach focuses on K-Nearest Neighbours. As well as K-Means algorithm, KNN allows to group entities, although the main difference is that K-Means clustering is unsupervised type of model, meaning it uses specific metrics to describe performance depending on those features and tunes model if performance is too low.

On the other hand, KNN classification is supervised kind of model, which has predefined group for each entity, it allows to compare those predefined results and the ones, generated by KNN.

The simple idea is to identify the group, entity most probably belongs to, based on predefined number of entities to consider as neighbours, closest to the point to be classified.

The most popular use cases are fingerprint detection, money laundering analyses and stock market forecasting. Here is a good introductory article about K-Nearest Neighbours https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761 .

Decision Tree classification

The following topic is Decision Tree algorithm, and it is another classification model, although the interesting aspect of it is that it is probably the best starting point to get familiar with concept of ensemble learning.

Decision Tree itself is widely used in task like given advice about optimal purchase option to be picked by customer with specific needs, while Random forest can be represented as collection of such trees, each tree randomly picks data from the available dataset, it results higher flexibility and boosts performance.

The whole concept of ensemble learning is about combining different simple, base models in order to solve novel and specific problems, this can lead to unexpected results and nontrivial solutions.

Depending on preferences, here is an interesting article about Decision Trees - https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052 .

Nontrivial example of Random Forest use case for Microsoft Kinect - https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/BodyPartRecognition.pdf

and general but more detailed information about Ensemble Learning - https://towardsdatascience.com/simple-guide-for-ensemble-learning-methods-d87cc68705a2 .

Note about further content

Two last topics are a little bit more complex and the idea is to provide brief conceptual description along with useful practical and theoretical materials.

Those topics are Recommendation systems and Natural Language Processing.

Collaborative Filtering

Collaborative filtering is a powerful mechanism, which unsurprisingly allows to recommend custom content, depending on previous observations. Collaborative filtering can be divided in two groups user-based and item based.

User-based approach allows to make recommendation to target person, considering other people from the available dataset, while item-based approach generates recommendation, based only on previously observed items of concrete target person.

Good real-life examples can be generated playlists on Spotify and YouTube, based on your history or generated recommendations of products you might like, based on your web-page activity.

For those, who want to read more, here is good article about Collaborative Filtering - https://towardsdatascience.com/intro-to-recommender-system-collaborative-filtering-64a238194a26 ,

another practical tutorial with example of Collaborative filtering algorithm –

https://realpython.com/build-recommendation-engine-collaborative-filtering/#what-is-collaborative-filtering

and article about recommendation system based on Restricted Boltzmann Machine model –

https://medium.com/datadriveninvestor/recommender-systems-using-rbm-79d65fcadf8f .

Natural Language Processing

The last but not for sure the least topic is Natural Language Processing.

The topic itself is very wide and extremely interesting, for this reason, a genial description with a few use cases will be shown and the main focus will be on highlighting valuable resources for further personal exploration.

NLP is an artificial intelligence concept, which allows to boost computer and human interaction with the use of natural human language, as well as text and speech generation and recognition.

Good use cases are smart assistances, provided by corporations such as Google, Amazon and Yandex, as well as text generation for sentence autocompletion task.

For those, who want to read more, here is good introductory article about NLP –

https://medium.com/@ODSC/an-introduction-to-natural-language-processing-nlp-8e476d9f5f59 ,

basic NLP tutorial for people who are familiar with PyTorch and RNNs –

https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html

and most probably best article about RNNs as an additional reading –

http://karpathy.github.io/2015/05/21/rnn-effectiveness/ .

Conclusion

Hopefully it was interesting set of topics, which will help to boost a skill of self-exploration and wake up desire to learn more about data science, it really worth to do so.

Creator of the article Daniil Kozlov