Least Squares Solution Explained: How to Find & Apply It in Linear Algebra & Machine Learning Matrix Space Toolkit in SageMath

Finding Order in Messy Data: Least Squares & SageMath – A Perfect Combo

📌 Introduction

Ever looked at a bunch of data points scattered all over a graph and wondered: "How do I find a pattern in this mess?" That’s where Least Squares comes to the rescue. It’s a simple but powerful way to draw the "best fit" line through imperfect data.

And the best part? You don’t have to do it all by hand. With SageMath, a free and open-source math tool, you can solve complex data-fitting problems in just a few lines of code.

This blog is your combo-pack:

💡 Concept + 💻 Code = Real Understanding

🧠 Part 1: What Is Least Squares?

Imagine you have some data about how many hours students studied and the marks they got. The points are all over the place—not in a perfect line. Still, you feel there’s a trend. Least squares helps you draw the line that best follows the trend, even if the data is noisy.

We use a line:

\( 📉 y = mx + c \)

We want to minimize the total squared difference between actual and predicted values:

\( S = \sum_{i} \left( y_{i} - \left( m x_{i} + c \right) \right)^2 \)

This "squared error" gets smaller when our line is close to the points—and least squares finds the m and c that make it as small as possible.

💻 Part 2: Doing It in SageMath – Line Fitting Example

Let’s use SageMath to fit a line to data points.

👆 This code will give you the best values for m and c using the least squares method.

Least Squares as an Optimization Problem

At its core, least squares is an optimization problem. You want to find the values (like m and c in y = mx + c) that minimize a cost function:

🧮 Cost Function (Sum of Squared Errors):

\( J(m, c) = \sum_{i} \left( y_{i} - \left( m x_{i} + c \right) \right)^2 \)

This makes least squares part of a bigger family of techniques called numerical optimization.

💡 Enter Gradient Descent

Instead of solving equations directly (e.g., normal equation), gradient descent uses a step-by-step method to move toward the minimum:

⏳ With enough iterations, we get the best-fit parameters without solving equations directly!

🧮 Part 3: Beyond Lines – Higher Dimensions with SageMath

Least squares isn’t just about fitting lines. It also works for solving systems of equations that don't have exact solutions. Think of situations where:

You have more equations than unknowns (overdetermined systems).
You want the closest possible solution.

Let’s say you have a 6×4 matrix A and a 6D vector b.

🎯 This gives you the x* such that A·x* is as close as possible to b.

🔍 Part 4: Visualizing the Idea – Orthogonal Projection

When b isn’t in the column space of A, the vector A·x* is the projection of b onto that space. It's like casting a shadow of b onto the space defined by A’s columns—finding the nearest point that can be "reached" by A.

Visualizing Residuals (Error Gaps)

Residuals are the vertical distances between the data points and the regression line. Plotting them helps show how well the model fits.

📉 This gives a visual idea of how far off each point is from the prediction—essential for goodness of fit analysis.

🌍 Real-Life Use Cases

Machine Learning: Linear regression, cost function minimization
Physics/Engineering: Curve fitting to experimental data
Navigation Systems (GPS): Best position from noisy satellite signals
Finance: Predicting stock trends with historical data
Image/Signal Processing: Denoising and signal recovery

🧠 Bonus: Normal Equation and Pseudoinverse

You can solve least squares problems in two common ways:

\( \textbf{Normal Equation:} \quad A^T A x = A^T b \)
\( \textbf{Pseudoinverse Method (shortcut):} \quad x = A^+ b = (A^T A)^{-1} A^T b \)

Both give the same result if A has full column rank.

🔧 Enhanced Residuals Section: Why Residuals Matter

Residuals aren't just leftover values—they're essential diagnostics in regression.

Goodness of Fit: If residuals are small and randomly scattered, your model fits the data well.
Model Assumptions: Residuals help check for violations like:
- Non-linearity (residuals show curved patterns)
- Heteroscedasticity (variance of residuals increases)
- Autocorrelation (important in time series)
Outliers and Influence: Large residuals may indicate outliers or points unduly influencing the model.

👉 Tip: A residual plot with no visible pattern usually means a well-behaved model.

Method	Best For	Strength	Weakness	When to Use
Least Squares	Linear trends, clean data	Fast & interpretable	Sensitive to outliers	Basic regression, fast diagnostics
Robust Regression	Outlier-heavy data	Less affected by outliers	May ignore real signals	Finance, real-world noisy data
Polynomial Regression	Curved trends	Models nonlinear patterns	Overfits easily	Small, well-behaved datasets
Ridge Regression	High-dimensional data	Reduces overfitting, handles multicollinearity	Can shrink useful variables	Large datasets, correlated predictors
Lasso Regression	Feature selection	Forces unimportant features to 0	May drop too much	Sparse models, automatic selection

🧠 Extra Note:Use Ridge when all predictors may matter, and Lasso when you suspect only a few do.

Try It Yourself – Interactive SageMath Demo

Let’s make it fun! Try tweaking data points or the degree of a polynomial:

📈 See how the curve changes as you increase the degree—explore overfitting vs underfitting in real time!

Case Study – Predicting Stock Trends

To use the least squares method to identify an underlying linear trend in artificially generated stock prices that include random fluctuations.

📘 What’s Next: Fitting Curves with Least Squares

In this post, we explored how least squares helps us find the best-fit line through imperfect data.

But what if the relationship isn’t linear? Real-world patterns are often curved, nonlinear, or more complex than a straight line.

🔜 In our next blog, we’ll dive into:

How to fit curves (like quadratics or exponentials) using least squares

How to frame curve fitting as a linear algebra problem

SageMath code to implement and visualize curved fits

Real-world applications in science, forecasting, and more

✨ Stay tuned for “Fitting Curves as a Least Squares Problem” – where we take the next step in mastering data modeling with SageMath.

Search This Blog

Mathsmagic

Understanding the Efficacy of Over-Parameterization in Neural Networks

Least Squares Solution Explained: How to Find & Apply It in Linear Algebra & Machine Learning