Heuristic Computation and the Discovery of Mersenne Primes

Heuristic Computation and the Discovery of Mersenne Primes Heuristic Computation and the Discovery of Mersenne Primes “Where Strategy Meets Infinity: The Quest for Mersenne Primes” Introduction: The Dance of Numbers and Heuristics Mersenne primes are not just numbers—they are milestones in the vast landscape of mathematics. Defined by the formula: \[ M_p = 2^p - 1 \] where \( p \) is itself prime, these giants challenge our computational limits and inspire new methods of discovery. But why are these primes so elusive? As \( p \) grows, the numbers become astronomically large, making brute-force testing impossible. This is where heuristic computation steps in—guiding us with smart, experience-driven strategies. “In the infinite sea of numbers, heuristics are our compass.” Let’s explore how heuristics and algorithms intertwine to unveil these mathematical treasures. 1. Mersenne Primes — Giants of Number Theory Definition: Numbers of the form \( M_p = 2^p - 1 \...

Least Squares Solution Explained: How to Find & Apply It in Linear Algebra & Machine Learning

Least Squares Solution Explained: How to Find & Apply It in Linear Algebra & Machine Learning Matrix Space Toolkit in SageMath

Finding Order in Messy Data: Least Squares & SageMath – A Perfect Combo

๐Ÿ“Œ Introduction

Ever looked at a bunch of data points scattered all over a graph and wondered: "How do I find a pattern in this mess?" That’s where Least Squares comes to the rescue. It’s a simple but powerful way to draw the "best fit" line through imperfect data.

And the best part? You don’t have to do it all by hand. With SageMath, a free and open-source math tool, you can solve complex data-fitting problems in just a few lines of code.

This blog is your combo-pack:

๐Ÿ’ก Concept + ๐Ÿ’ป Code = Real Understanding

๐Ÿง  Part 1: What Is Least Squares?

Imagine you have some data about how many hours students studied and the marks they got. The points are all over the place—not in a perfect line. Still, you feel there’s a trend. Least squares helps you draw the line that best follows the trend, even if the data is noisy.

We use a line:

\( ๐Ÿ“‰ y = mx + c \)

We want to minimize the total squared difference between actual and predicted values:

\( S = \sum_{i} \left( y_{i} - \left( m x_{i} + c \right) \right)^2 \)

This "squared error" gets smaller when our line is close to the points—and least squares finds the m and c that make it as small as possible.

๐Ÿ’ป Part 2: Doing It in SageMath – Line Fitting Example

Let’s use SageMath to fit a line to data points.

๐Ÿ‘† This code will give you the best values for m and c using the least squares method.

Least Squares as an Optimization Problem

At its core, least squares is an optimization problem. You want to find the values (like m and c in y = mx + c) that minimize a cost function:

๐Ÿงฎ Cost Function (Sum of Squared Errors):

\( J(m, c) = \sum_{i} \left( y_{i} - \left( m x_{i} + c \right) \right)^2 \)

This makes least squares part of a bigger family of techniques called numerical optimization.

๐Ÿ’ก Enter Gradient Descent

Instead of solving equations directly (e.g., normal equation), gradient descent uses a step-by-step method to move toward the minimum:

⏳ With enough iterations, we get the best-fit parameters without solving equations directly!

๐Ÿงฎ Part 3: Beyond Lines – Higher Dimensions with SageMath

Least squares isn’t just about fitting lines. It also works for solving systems of equations that don't have exact solutions. Think of situations where:

  • You have more equations than unknowns (overdetermined systems).
  • You want the closest possible solution.

Let’s say you have a 6×4 matrix A and a 6D vector b.

๐ŸŽฏ This gives you the x* such that A·x* is as close as possible to b.

๐Ÿ” Part 4: Visualizing the Idea – Orthogonal Projection

When b isn’t in the column space of A, the vector A·x* is the projection of b onto that space. It's like casting a shadow of b onto the space defined by A’s columns—finding the nearest point that can be "reached" by A.

Visualizing Residuals (Error Gaps)

Residuals are the vertical distances between the data points and the regression line. Plotting them helps show how well the model fits.

๐Ÿ“‰ This gives a visual idea of how far off each point is from the prediction—essential for goodness of fit analysis.

๐ŸŒ Real-Life Use Cases

  • Machine Learning: Linear regression, cost function minimization
  • Physics/Engineering: Curve fitting to experimental data
  • Navigation Systems (GPS): Best position from noisy satellite signals
  • Finance: Predicting stock trends with historical data
  • Image/Signal Processing: Denoising and signal recovery

๐Ÿง  Bonus: Normal Equation and Pseudoinverse

You can solve least squares problems in two common ways:

  1. \( \textbf{Normal Equation:} \quad A^T A x = A^T b \)
  2. \( \textbf{Pseudoinverse Method (shortcut):} \quad x = A^+ b = (A^T A)^{-1} A^T b \)

Both give the same result if A has full column rank.

๐Ÿ”ง Enhanced Residuals Section: Why Residuals Matter

Residuals aren't just leftover values—they're essential diagnostics in regression.

  • Goodness of Fit: If residuals are small and randomly scattered, your model fits the data well.
  • Model Assumptions: Residuals help check for violations like:
    • Non-linearity (residuals show curved patterns)
    • Heteroscedasticity (variance of residuals increases)
    • Autocorrelation (important in time series)
  • Outliers and Influence: Large residuals may indicate outliers or points unduly influencing the model.

๐Ÿ‘‰ Tip: A residual plot with no visible pattern usually means a well-behaved model.

Method Best For Strength Weakness When to Use
Least Squares Linear trends, clean data Fast & interpretable Sensitive to outliers Basic regression, fast diagnostics
Robust Regression Outlier-heavy data Less affected by outliers May ignore real signals Finance, real-world noisy data
Polynomial Regression Curved trends Models nonlinear patterns Overfits easily Small, well-behaved datasets
Ridge Regression High-dimensional data Reduces overfitting, handles multicollinearity Can shrink useful variables Large datasets, correlated predictors
Lasso Regression Feature selection Forces unimportant features to 0 May drop too much Sparse models, automatic selection

๐Ÿง  Extra Note:Use Ridge when all predictors may matter, and Lasso when you suspect only a few do.

Try It Yourself – Interactive SageMath Demo

Let’s make it fun! Try tweaking data points or the degree of a polynomial:

๐Ÿ“ˆ See how the curve changes as you increase the degree—explore overfitting vs underfitting in real time!

Case Study – Predicting Stock Trends

To use the least squares method to identify an underlying linear trend in artificially generated stock prices that include random fluctuations.

๐Ÿ“˜ What’s Next: Fitting Curves with Least Squares

In this post, we explored how least squares helps us find the best-fit line through imperfect data.

But what if the relationship isn’t linear? Real-world patterns are often curved, nonlinear, or more complex than a straight line.

๐Ÿ”œ In our next blog, we’ll dive into:

  • How to fit curves (like quadratics or exponentials) using least squares
  • How to frame curve fitting as a linear algebra problem
  • SageMath code to implement and visualize curved fits
  • Real-world applications in science, forecasting, and more

✨ Stay tuned for “Fitting Curves as a Least Squares Problem” – where we take the next step in mastering data modeling with SageMath.

Comments

Popular posts from this blog

๐ŸŒŸ Illuminating Light: Waves, Mathematics, and the Secrets of the Universe

Spirals in Nature: The Beautiful Geometry of Life