Visualization of Cost Function for Logistic Regression

Suppose, as in linear regression, we try to minimize the sum of half the square of the vertical distance between each example and the logistic curve:

\[J(\theta_0, \theta_1) = \sum_{i=1}^m{\frac{1}{2}(\frac{1}{1 + e^{\theta_0 + \theta_1x^{(i)}}} - y^{(i)})^2}\]

The three-dimensional graph below shows the values of this cost function for a simple dataset and different values of \(\theta_0\) and \(\theta_1\).

You can click and drag to rotate the graph, scroll to zoom in and out, and hover over the data points in the graph to see each value of \(\theta_0\), \(\theta_1\), and \(J\).

As you can see, for a logistic model, this cost function generates a surface with large flat regions. The process of gradient descent can easily get “stuck” in these regions. If you try to minimize this cost function for a logistic model, you’ll be lucky if it converges at all, much less in a reasonable number of iterations.