Support Vector Machine — no longer a celebrated hero!
These days analytics professionals favor Neural Networks (NN) over SVM’s for want of higher accuracy. We can find many papers that prove the superiority of NN over SVM. This is also due to the fact that if one can train a NN that performs better than SVM, then it becomes an opportunity to publish a paper. However, a paper is less likely to be published if SVM scores over NN!
In this context, this article explores the superiority of SVM — the crumbling hero, over NN.
Difference between SVM and Deep Learning
Before getting down to business, let us first look at the intuitive difference between the two techniques:
Deep learning (DL) as the name suggests is about stacking many processing layers one atop the other. The deeper the architecture is the more layers it has. The intuition behind DL comes from the compositional nature of natural stimuli such as speech and vision. Natural signals are highly compositional, that is, simple primitive features combine to form mid-level features while mid-level features combine to form high-level features.
Thus, DL is about learning increasingly more abstract representations in a layer-wise manner. Each layer feeds from the layer below and then sends the output to the layer above and so on. This process leads to neurons high up the hierarchy being sensitive to particular complete objects or scenes in case of vision applications that is.
Support vector machines (SVM), on the other hand, are based on finding a splitting boundary (a hyperplane in linearly separable cases) that is as far away as possible from the nearest points (support vectors) on either side. In other words, given a set of points that belong to either of the two classes, A and B. The SVM is about finding a boundary passing between the points such as to partition the space into side A and side B while maintaining the largest distance away from any nearest point on either side. Those points near to the decision boundary are called support vectors because they are the ones “supporting” the boundary. Only support vectors are used in computing the boundary.
SVM’s can also be stacked in a layer-wise manner to form a deep variant of SVM networks. So the fact that DL is based on such stacked layers means that it is possible to make a deep neural net full of SVM’s.
Any structure that processes a signal in an increasingly abstract manner using many layers one atop the other is called a DL structure. The specific processing that occurs in each of the layers doesn’t matter, and so DL is abstract in a way. An SVM, on the other hand, is a binary classifier that learns by margin maximization. SVM’s can be stacked to form a DL architecture.
Limitations of NN’s over SVM’s
- Higher Complexity: An SVM possesses a number of parameters that increase linearly with the linear increase in the size of the input. A NN, on the other hand, doesn’t. A neural network can have as many layers as we want. This, in turn, implies that a deep neural network with the same number of parameters as an SVM always has a higher complexity than the latter.
- Data-hungry: To train NN’s, you need massive amounts of data. What do you do when you have very little data? Here is a standard set of benchmark datasets: UCI Machine Learning Repository. Pick any dataset from this set with fewer than 1000 training examples, and try training a NN that beats SVM on that data by a large margin. This is by far the most important point in my opinion.
- Huge computational resource requirement: You can’t do much with NN’s unless you have GPU’s. On the other hand, you can train SVM’s on personal machines without GPU’s.
- CNN’s require spatial property: Convolution operation performs an operation on a set of pixels or a sequence of words/audio signals that are close-by. Shuffling the pixels/words/audio signals will change the output of the CNN completely. That is, the order of the features is important, or in other words, convolution is a “spatial” operation. SVM’s are unaffected by shuffling of features. So problems which do not have the spatial property will not benefit from CNN’s.
- Less Interpretable: Many a times, you have little idea about what is going on inside the network, particularly, at layers closer to the output. This again makes them harder to improve, since you don’t know much about what’s going wrong. SVM’s may not be completely interpretable either, but they are more interpretable than NN’s.
- More time to set-up: Unless you are doing something very similar to ImageNet, you will not find a pretrained model on the web. So you will have to write a significant amount of code to train and evaluate a reasonable NN model, even when you build upon the standard deep learning frameworks. With SVM’s, you just download LibSVM, and can start training your models in a few minutes.
- Sensitivity to Initial Randomization of Weights: Because NN’s use gradient descent, this makes them sensitive to the initial randomization of its weight matrix. This is because, if the initial randomization places the neural network close to a local minimum of the optimization function, the accuracy will never increase past a certain threshold. SVM’s are more reliable instead, and they guarantee convergence to a global minimum regardless of their initial configuration.
Disclaimer
My intention to write this article is not to downplay Neural Networks versus SVM’s. NN’s may overshadow SVM’s in the following cases:
- NNs can handle multi-class problems by producing probabilities for each class. In contrast, SVM’s handle these problems using independent one-versus-all classifiers where each produces a single binary output. For example, a single NN can be trained to solve the hand-written digits problem while 10 SVM’s (one for each digit) are required.
- Another advantage of ANNs, from the perspective of model size, is that the model is fixed in terms of its inputs nodes, hidden layers, and output nodes; in an SVM, however, the number of support vector lines could reach the number of instances in the worst case.
- The SVM does not perform well when the number of features is greater than the number of samples. More work in feature engineering is required for an SVM than that needed for a multi-layer Neural Network.
Thus, it is certain that SVM’s are here to stay in the world of Machine Learning algorithms. Their use-cases cannot be completely taken over, given the current state of the algorithm family.
Thanks for reading! Stay Safe!