Good initial estimates lie close to the final globally optimal parameter estimate.In nonlinear regression, the sum of squared errors (SSE) is only "close to" parabolic in the region of the final parameter estimates.Finally, Newton views the method as purely algebraic and makes no mention of the connection with calculus.
An analytical expression for the derivative may not be easily obtainable and could be expensive to evaluate.
In these situations, it may be appropriate to approximate the derivative by using the slope of a line through two nearby points on the function.
A large error in the initial estimate can contribute to non-convergence of the algorithm.
To overcome this problem one can often linearise the function that is being optimized using calculus, logs, differentials, or even using evolutionary algorithms, such as the Stochastic Funnel Algorithm.
However, the extra computations required for each step can slow down the overall performance relative to Newton's method, particularly if or its derivatives are computationally expensive to evaluate.
The name "Newton's method" is derived from Isaac Newton's description of a special case of the method in De analysi per aequationes numero terminorum infinitas (written in 1669, published in 1711 by William Jones) and in De metodis fluxionum et serierum infinitarum (written in 1671, translated and published as Method of Fluxions in 1736 by John Colson).
If the first derivative is not well behaved in the neighborhood of a particular root, the method may overshoot, and diverge from that root.
An example of a function with one root, for which the derivative is not well behaved in the neighborhood of the root, is the root will not be overshot at all.
The essence of Vieta's method can be found in the work of the Persian mathematician Sharaf al-Din al-Tusi, while his successor Jamshīd al-Kāshī used a form of Newton's method to solve (Ypma 1995).