Gradient Descent Convergence Calculator
f'(x) = 2x (Gradient for calculation)
Optimization Results
| Step | Current x | Gradient (2x) | New x |
|---|
Understanding Gradient Descent and Learning Rates
Gradient Descent is a first-order iterative optimization algorithm used to find a local minimum of a differentiable function. In the world of Machine Learning and Deep Learning, it is the fundamental engine used to train models by minimizing the error (cost) between predictions and actual data.
The Mathematical Logic
The core logic of Gradient Descent is captured in the update rule:
- xold: Your current position (parameter value).
- α (Learning Rate): A hyperparameter that determines the step size. If α is too high, the algorithm may overshoot the minimum. If α is too low, convergence will be slow.
- f'(xold): The derivative (gradient) of the function at the current point, which indicates the direction of steepest ascent.
How the Learning Rate Affects Convergence
Choosing the right learning rate is critical for model performance:
- Small Learning Rate (e.g., 0.001): Ensures reliable convergence but takes many iterations to reach the global minimum.
- Large Learning Rate (e.g., 0.9): Risks "bouncing" back and forth across the valley or even diverging (getting further away from the minimum).
- Optimal Learning Rate: Efficiently reaches the minimum without significant oscillation.
Practical Example
Imagine we want to minimize f(x) = x². The minimum is clearly at x = 0. If we start at x = 10 with a learning rate of 0.1:
- Step 1: Gradient is 2 * 10 = 20. New x = 10 – (0.1 * 20) = 8.
- Step 2: Gradient is 2 * 8 = 16. New x = 8 – (0.1 * 16) = 6.4.
- Step 3: Gradient is 2 * 6.4 = 12.8. New x = 6.4 – (0.1 * 12.8) = 5.12.
Notice how each step gets smaller as we approach the minimum where the slope is flatter. This calculator allows you to visualize this process numerically.