# Logistic Regression: Binary And Multinomial

Should I use binary logistic or multinomial logistic? (some people tell me to use multinomial logistic but a book said to only use it when the DV has more than two levels, and my DV only has 2 levels - never or sometime).

## Logistic Regression: Binary and Multinomial

What are the advantages of multinomial logistic regression over set of binary logistic regressions (i.e. one-vs-rest scheme)? By set of binary logistic regression I mean that for each category $y_i \in Y$ we build separate binary logistic regression model with target=1 when $Y=y_i$ and 0 otherwise.

$\bf P'(i)= \fracexp(logit_i)exp(logit_i)+exp(logit_j)+\dots+exp(logit_r)$, where $i,j,\dots,r$ are all the categories, and if $r$ was chosen to be the reference one its $\bf exp(logit)=1$. So, for binary logistic that same formula becomes $\bf P'(i)= \fracexp(logit_i)exp(logit_i)+1$. Multinomial logistic relies on the (not always realistic) assumption of independence of irrelevant alternatives whereas a series of binary logistic predictions does not.

A separate theme is what are technical differences between multinomial and binary logistic regressions in case when $Y$ is dichotomous. Will there be any difference in results? Most of the time in the absence of covariates the results will be the same, still, there are differences in the algorithms and in output options. Let me just quote SPSS Help about that issue in SPSS:

Because of the title, I'm assuming that "advantages of multiple logistic regression" means "multinomial regression". There are often advantages when the model is fit simultaneously. This particular situation is described in Agresti (Categorical Data Analysis, 2002) pg 273. In sum (paraphrasing Agresti), you expect the estimates from a joint model to be different than a stratified model. The separate logistic models tend to have larger standard errors although it may not be so bad when the most frequent level of the outcome is set as the reference level.

It seems that the question was not at all about the implementation/structural differences between (a) the softmax (multinomial logistic) regression model and (b) the OvR "composite" model based on multiple binary logistic regression models. In a nutshell, however, skipping all the formulas, these differences can be summarized like this:

It also seems that there was no need to explain the differences between the binary, the OvR/OvO "composite" models and the "native" multilabel classifiers like the multinomial logistic regressor (aka the softmax regressor).

The softmax regression (LogisticRegression(multi_class="multinomial") in scikit-learn) is more flexible when setting the linear decision boundaries among the classes. Here is a two-dimensional three-class illustration of this: -learn.org/stable/auto_examples/linear_model/plot_logistic_multinomial.html

For binary classification this is not a disadvantage when compared to Softmax/multinomial, since the latter also sets a linear boundary between the two classes.Or imagine three clusters that are at approximately the same distances from each other (i.e. each class cluster is on the vertex of an equilateral triangle). In such a case the accuracy of both OvR Logit and Softmax will be good for all the classes.

Logistic regression is a frequently used method because it allows to model binomial (typically binary) variables, multinomial variables (qualitative variables with more than two categories) or ordinal (qualitative variables whose categories can be ordered). It is widely used in the medical field, in sociology, in epidemiology, in quantitative marketing (purchase or not of products or services following an action) and in finance for risk modeling (scoring).

If fit_intercept is set to False, the intercept is set to zero.intercept_ is of shape (1,) when the given problem is __binary.In__ particular, when multi_class='multinomial', intercept_corresponds to outcome 1 (True) and -intercept_ corresponds tooutcome 0 (False).

In this paper we discuss auxiliary variable approaches to Bayesian binary and multinomial regression. These approaches are ideally suited to automated Markov chain Monte Carlo simulation. In the first part we describe a simple technique using joint updating that improves the performance of the conventional probit regression algorithm. In the second part we discuss auxiliary variable methods for inference in Bayesian logistic regression, including covariate set uncertainty. Finally, we show how the logistic method is easily extended to multinomial regression models. All of the algorithms are fully automatic with no user set parameters and no necessary Metropolis-Hastings accept/reject steps.

In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression analysis, logistic regression[1] (or logit regression) is estimating the parameters of a logistic model (the coefficients in the linear combination). Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable (two classes, coded by an indicator variable) or a continuous variable (any real value). The corresponding probability of the value labeled "1" can vary between 0 (certainly the value "0") and 1 (certainly the value "1"), hence the labeling;[2] the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See Background and Definition for formal mathematics, and Example for a worked example.

Binary variables are widely used in statistics to model the probability of a certain class or event taking place, such as the probability of a team winning, of a patient being healthy, etc. (see Applications), and the logistic model has been the most commonly used model for binary regression since about 1970.[3] Binary variables can be generalized to categorical variables when there are more than two possible values (e.g. whether an image is of a cat, dog, lion, etc.), and the binary logistic regression generalized to multinomial logistic regression. If the multiple categories are ordered, one can use the ordinal logistic regression (for example the proportional odds ordinal logistic model[4]). See Extensions for further extensions. The logistic regression model itself simply models probability of output in terms of input and does not perform statistical classification (it is not a classifier), though it can be used to make a classifier, for instance by choosing a cutoff value and classifying inputs with probability greater than the cutoff as one class, below the cutoff as the other; this is a common way to make a binary classifier.

Analogous linear models for binary variables with a different sigmoid function instead of the logistic function (to convert the linear combination to a probability) can also be used, most notably the probit model; see Alternatives. The defining characteristic of the logistic model is that increasing one of the independent variables multiplicatively scales the odds of the given outcome at a constant rate, with each independent variable having its own parameter; for a binary dependent variable this generalizes the odds ratio. More abstractly, the logistic function is the natural parameter for the Bernoulli distribution, and in this sense is the "simplest" way to convert a real number to a probability. In particular, it maximizes entropy (minimizes added information), and in this sense makes the fewest assumptions of the data being modeled; see Maximum entropy.

The parameters of a logistic regression are most commonly estimated by maximum-likelihood estimation (MLE). This does not have a closed-form expression, unlike linear least squares; see Model fitting. Logistic regression by MLE plays a similarly basic role for binary or categorical responses as linear regression by ordinary least squares (OLS) plays for scalar responses: it is a simple, well-analyzed baseline model; see Comparison with linear regression for discussion. The logistic regression as a general statistical model was originally developed and popularized primarily by Joseph Berkson,[5] beginning in Berkson (1944) harvtxt error: no target: CITEREFBerkson1944 (help), where he coined "logit"; see History.

This simple model is an example of binary logistic regression, and has one explanatory variable and a binary categorical variable which can assume one of two categorical values. Multinomial logistic regression is the generalization of binary logistic regression to include any number of explanatory variables and any number of categories.

The basic setup of logistic regression is as follows. We are given a dataset containing N points. Each point i consists of a set of m input variables x1,i ... xm,i (also called independent variables, explanatory variables, predictor variables, features, or attributes), and a binary outcome variable Yi (also known as a dependent variable, response variable, output variable, or class), i.e. it can assume only the two possible values 0 (often meaning "no" or "failure") or 1 (often meaning "yes" or "success"). The goal of logistic regression is to use the dataset to create a predictive model of the outcome variable.

To begin with, we may consider a logistic model with M explanatory variables, x1, x2 ... xM and, as in the example above, two categorical values (y = 0 and 1). For the simple binary logistic regression model, we assumed a linear relationship between the predictor variable and the log-odds (also called logit) of the event that y = 1 \displaystyle y=1 . This linear relationship may be extended to the case of M explanatory variables: 041b061a72