Suppose, as in linear regression, we try to minimize the sum of half the square of the vertical distance between each example and the logistic curve:

\[J(\theta_0, \theta_1) = \sum_{i=1}^m{\frac{1}{2}(\frac{1}{1 + e^{\theta_0 + \theta_1x^{(i)}}} - y^{(i)})^2}\]

The three-dimensional graph below shows the values of this cost function for a simple dataset and different values of \(\theta_0\) and \(\theta_1\).

You can click and drag to rotate the graph, scroll to zoom in and out, and hover over the data points in the graph to see each value of \(\theta_0\), \(\theta_1\), and \(J\).

As you can see, for a logistic model, this cost function generates a surface with large flat regions. The process of gradient descent can easily get “stuck” in these regions. If you try to minimize this cost function for a logistic model, you’ll be lucky if it converges at all, much less in a reasonable number of iterations.