Is logistic regression regression?

Wait 5 sec.

[This article was first published on datascienceconfidential - r, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.I came across a post recently by a machine learning engineer who made the bold claim that logistic regression is the worst name for an algorithm ever, or something along those lines1. Many statisticians of the more old-school type seemed to disagree. This led me to think a bit more deeply about the subject. I’ve already written several posts on bad terminology in statistics (see confidence level, line of best fit, r squared) so I might have been expected to agree with the machine learning view, but in this case I agree with the statisticians, and I would like to explain why.What data scientists think regression isIn data science classes, students are taught that there are two kinds of predictive modelling. In both cases, the aim is to predict a response $Y$ given a vector of features $X$. If $Y$ is real-valued (numeric in R terminology) then it’s a regression problem. If $Y$ is categorical then it’s a classification problem. I’m not sure where this terminology originated, but it’s certainly been propogated very widely by Hastie and Tibshirani’s classic The Elements of Statistical Learning.In logistic regression, your data consists of some feature values $X$ and a response $Y \in \lbrace 0, 1 \rbrace$. In this case, the response is definitely categorical, so someone trained in data science would indeed call this a classification problem. But if you look more closely at the output produced by logistic regression, its predicted values are numbers, namely the probability of each data point being in the class labelled $1$. You need to do something to these numbers (for example, use a cutoff) in order to get a predicted class.For example, in R:set.seed(100)N