Machine learning is a field of study and is concerned with algorithms that learn from examples. There are many different types of machine learning algorithms that you may encounter. Also, it has various specialized approaches to modeling that may be used for machine learning tasks, for example, classification, clustering, regression… etc.

In this article, let’s explore one of the most famous machine learning tasks: **classification**.

**What is classification?**

Classification in machine learning is a **supervised learning** approach that requires the use of machine learning algorithms that learn how to assign a class label to examples from the problem domain. In other words, classification is a process of **categorizing a given set of data into classes**. A simple example is classifying your emails as “spam” or “not spam.”

##### Figure 1

**Classification algorithms look at existing data and predicts what a new data belongs to.** An algorithm that implements classification, especially in a concrete implementation, is known as a **classifier**. For example, the grey line in Figure 1 can be viewed as a classifier. An unknown audio clip can also be classified using some classification algorithms just as shown in Figure 2.

##### Figure 2

**Popular classification algorithms**

Since there are too many machine learning algorithms that can be used for classification problems, we will only list several here.

**Support Vector Machine (SVM)**

Many machine learning algorithms are available to build classifiers. One of the most popular classifiers is the **Support Vector Machine (SVM)**, which seeks to maximize the margin for error between the decision boundary and training samples. The basic idea is to find the boundary between the samples representing a particular activity and those that do not. This is done by finding those samples nearest to the boundary and determining **an** **optimal dividing line**—or, in the case of high-dimensional data, **a dividing hyperplane**. Besides dealing with linearly separable data, the introduction of soft margins allow SVMs to handle non-separable data and the application of the kernel trick allows them to learn non-linear decision boundaries. The sequential minimal optimization algorithm provides a computationally efficient method of learning SVM models.

C.C. Chang and C.J. Lin provide a popular library for SVMs which is called LIBSVM. LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM).

**Decision tree**

**Decision tree **is also a simple, quick and popular way to train a classifier that has a straightforward interpretation. A decision tree lets you predict responses to data by following the decisions in the tree from the root (beginning) down to a leaf node. A tree consists of branching conditions where the value of a predictor is compared to a trained weight. The number of branches and the values of weights are determined in the training process. Additional modification, or pruning, may be used to simplify the model.

Most decision tree algorithms select rules that maximize the information gained by the split, as done by Quinlan’s Iterative Dichotomiser 3 (ID3) algorithm. Quinlan later developed the C4.5 algorithm, which extends ID3 to add support for missing values, continuous attributes, and attributes with different costs. C4.5 algorithm also performs pruning at the end, which tries to remove unnecessary rules and reduce over-fitting.

- Quinlan, J. R. (2014).
*C4. 5: programs for machine learning*. Elsevier.

*Decision Tree*

**Hidden Markov models (HMMs)**

**Hidden Markov models (HMMs)** are also very commonly used for classification. HMMs provide a relatively simple way to model sequential data. The system being modeled by an HMM is assumed to be a **Markov process with unobserved (hidden) states**. More specifically, we only know observational data and not information about the states. HMMs can provide a flexible way to model how a signal changes sequentially over time. Rabiner provides an excellent introduction and tutorial on HMMs.

**Neural-network based classifiers**

In recent years, various types of neural-network based classifiers are frequently used to solve ML problems. **An artificial neural network is composed of artificial neurons or nodes.** The connections of the neuron are modeled as weights. The simplest example is a naive neural network shown in the figure below, which consists of only an input layer, a hidden layer, and an output layer. Self-learning resulting from experience can occur within networks, which can derive conclusions from a complex and seemingly unrelated set of information.

*A simple example of neural network*

The **non-linear Rectified Linear Unit (ReLU)** function is frequently used within all the neural networks. The neural networks calculated dot products and followed it up with a ReLU function to learn the weights and bias. The **softmax function** is also a popular function used in neural networks, especially applied after the output layer. It is a non-linearity, but it is special in that it usually is the last operation done in a network. This is because it takes in a vector of real numbers and returns a probability distribution. The definition of softmax is as follows. Let \(x\) be a vector of real numbers, the \(i\)‘th component of Softmax(\(x\)) is

$$ \frac{e^{x_i}}{\sum_j e^{x_j}}. $$

The output of the softmax is a probability distribution; each element is non-negative and the sum over all components is 1.

**CNNs and RNNs**

Considering the development of deep learning these years, the most popular neural networks are **convolutional neural networks (CNNs)**, **recurrent neural networks (RNNs)**, or several ensembles of CNNs and RNNs. In general, CNN is a popular algorithm that pulls out the convoluted embeddings from the input data, and it is usually used in image processing aspect. RNNs are mostly applied after the CNN pulling out the deep figures.

Recurrent neural networks were based on David Rumelhart’s work in 1986^{[1]}. RNNs come in many variants but one of the most popular structure of RNNs used in classification problems is **long short-term memory (LSTM) networks**. LSTMs were discovered by Hochreiter and Schmidhuber in 1997^{[2]} and set accuracy records in multiple applications domains. LSTM can learn to recognize context-sensitive languages unlike previous models based on hidden Markov models (HMM) and similar concepts. The outstanding applications for LSTM are improving machine translation, language modeling and multi-lingual language processing. In recent years, LSTM combined with convolutional neural networks (CNNs) has significantly improved the accuracy of automatic image captioning and audio event detection.

- Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors.
*nature*, 323(6088), 533-536. - Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory.
*Neural computation*, 9(8), 1735-1780.

Since there still has lots of mentioned or not-mentioned algorithms that can be talked about, the detail will be left in the future posts.

**In a nutshell …**

**Classification** is one of the most popular **supervised learning** tasks that identifies to which of a set of categories a new observation belongs, on the basis of a training set of data containing observations (or instances) whose **category membership is known**. In short, **classification models are trained to classify data into categories**. There are many different types of classification algorithms (classifiers) for modeling classification problems. There is no good theory on how to map algorithms onto problem types; instead, it is generally recommended that a practitioner use controlled experiments and discover which algorithm and algorithm configuration results in the best performance for a given classification task. The most common classification problems nowadays are speech recognition, face detection, handwriting recognition, document classification… etc.

Hope you liked our articles, and we will have more articles about machine learning applications!

**References:**

- Statistical classification
- Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016).
*Deep learning*(Vol. 1, No. 2). Cambridge: MIT press.

**Related articles:**

*Editor: Chieh-Feng ChengPh.D. in ECE, Georgia TechTechnical Writer, inwinSTACK*