Sign in

Is a pantomath and a former entrepreneur. Currently, he is in a harmonious and a symbiotic relationship with Data.
Image Credits: AP Images for WWE

This is exactly what came to my mind when I first read about an hierarchical clustering algorithm called ‘ROCK’. The creators of this technique unknowingly drew up an analogy between Dwayne Johnson, the versatile American actor, producer, retired professional wrestler, and former American football and Canadian football player, and a clustering technique, which solves the problem of using data such as his long list of achievements as a variable in a clustering exercise.
In other words, ROCK, not Mr. …


Credits: https://gifer.com/

Introduction

Fuzzy logic principles can be used to cluster multidimensional data, assigning each point a membership in each cluster center from 0 to 100 percent. This can be very powerful compared to traditional hard-threshold clustering where every point is assigned a crisp, exact label. This algorithm works by assigning membership to each data point corresponding to each cluster center on the basis of distance between the cluster center and the data point. More the data is near to the cluster center more is its membership towards the particular cluster center. …


Image Credits: https://www.vertica.com/

Introduction

The exploratory nature of data analysis and data mining makes clustering one of the most usual tasks in these kind of projects. More frequently these projects come from many different application areas like biology, text analysis, signal analysis, etc. that involve larger and larger datasets in the number of examples and the number of attributes .The biggest challenge with clustering in real-life scenarios is the volume of the data and the consequential increase in the complexity, and need for more computational power.
These problems have opened an area for the search of algorithms able to reduce this data overload. Some solutions…


Image Source: https://www.pinterest.com

Introduction

Clustering is a process to group data into several clusters or groups so the data in one cluster has a maximum level of similarity and data between clusters has a minimum similarity. K-means (Duda & Hart, 1973; Bishop, 1995) has long been the workhorse for metric data. Its attractiveness lies in its simplicity, and in it’s local-minimum convergence properties. The center of the cluster or centroid is the starting point for the group in clusters in the K-Means algorithm. …


This article assumes that you have a basic understanding of A/B Testing and statistical tests. Here we will discuss some tips to ensure the success of A/B Tests.

Image Credits: https://rapidboostmarketing.com

A/B Testing Terminology

  1. Variant: Variant is the term…


Image Credits: https://neustan.wordpress.com

These days analytics professionals favor Neural Networks (NN) over SVM’s for want of higher accuracy. We can find many papers that prove the superiority of NN over SVM. This is also due to the fact that if one can train a NN that performs better than SVM, then it becomes an opportunity to publish a paper. However, a paper is less likely to be published if SVM scores over NN!

In this context, this article explores the superiority of SVM — the crumbling hero, over NN.

Difference between SVM and Deep Learning

Before getting down to business, let us first look at the intuitive difference between…


Need for Kernel Methods

SVM algorithms use a set of mathematical functions that are defined as the kernel. The function of kernel is to take data as input…


Image Credits: https://www.chaussurevip.fr/

XGBoost or eXtreme Gradient Boosting is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment (Kubernetes, Hadoop, SGE, MPI, Dask) and…


Image Credits: https://dlsun.github.io/

Image Credits: https://dataaspirant.com/

I was recently working on a Market Mix Model, wherein I had to predict sales from impressions. While working on an aspect of it I was confronted with the problem of choosing between a Random Forest and a XG Boost. This led to the inception of this article.

Before we get down to the arguments in favor of any of the algorithms, let us understand the underlying idea behind the two algorithms in brief.

The term gradient boosting consists of two sub-terms, gradient and boosting. Gradient boosting re-defines boosting as a numerical optimization problem where the objective is to minimize…

Aman Gupta

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store