Introduction to Machine Learning & Modelling Techniques

Supervised, Unsupervised & Reinforcement Learning

A 1-hour conceptual masterclass designed for beginner-to-intermediate data professionals to build an intuitive, visual, and practical mental map of ML algorithms.

Navigating the AI Ecosystem

Artificial Intelligence (AI): The overarching field of creating systems that mimic human intellect (includes search, heuristics, planning).
Machine Learning (ML): Systems learning rules and functions directly from structured datasets without manual programming.
Deep Learning (DL): Nested layers of neural networks learning complex patterns from raw unstructured data.
Large Language Models (LLM): Transformer-based generative AI systems understanding human text sequences.

The Three Pillars of ML

Supervised Learning: Training model parameters using paired input-output datasets to predict numeric labels or classes.
Example: Predicting customer default risk using historical loan details.
Unsupervised Learning: Organizing unlabeled data into natural partitions, dimensions, or clusters without human correction.
Example: Grouping buyers into behavioral segments based on shopping habits.
Reinforcement Learning: An active agent optimizing behavioral policies inside environments using trial-and-error rewards.
Example: Training a robot vacuum to navigate a room using pathfinding rewards.

Machine Learning Terminology

1. The Core Learning Frameworks

To build a strong foundation in Machine Learning 101, you need to master the core terminology that dictates how models learn, fail, and get evaluated.

Supervised Learning: Learning from labeled data (inputs paired with known correct outputs).
Example: Predicting housing prices from historical size, zip code, and sales price.
Unsupervised Learning: Finding hidden patterns or structures in unlabeled data.
Example: Grouping customers into distinct cohorts based on purchasing behaviors (Clustering).
Reinforcement Learning: Learning through trial and error using a system of rewards and penalties.
Example: Training self-driving agents to stay on a track by rewarding correct steering and penalizing collision.
Semi-Supervised Learning: Combining a small amount of labeled data with a large amount of unlabeled data to train models cost-effectively.
Example: Labeling a few medical scans manually, then training a classifier on those scans alongside thousands of unlabeled scans.

2. The Mechanics of Learning

Features vs. Target: Features are your input attributes (independent variables, X); the Target is what you want to predict (dependent variable, Y).
Loss (Cost) Function: A mathematical formula measuring how wrong a model's predictions are compared to actual targets. The goal of training is to minimize this loss.
Gradient Descent: The optimization algorithm used to tweak a model's internal weights step-by-step to lower the loss.

Learning Rate: A hyperparameter controlling how big of a step weights take during gradient descent. Too big overshoots; too small takes forever.
Parametric vs. Non-Parametric: Parametric models have a fixed number of weights (like Linear Regression). Non-parametric models grow parameters dynamically with the dataset size (like k-NN or Decision Trees).
Linear vs. Non-Linear vs. Spatial: Linear models assume straight trends. Non-linear models capture curves (kernels/trees). Spatial models classify based on coordinate proximity (k-NN).

3. Generalisation & Pitfalls

Overfitting: When a model learns the training data too well—including the random noise—and fails to predict new, unseen data accurately.
Symptom: High training accuracy, low validation accuracy.
Underfitting: When a model is too simple to capture the underlying trend in the data.
Symptom: Low training accuracy, low validation accuracy.

Bias-Variance Tradeoff: The ultimate balancing act. Bias is error from overly simple assumptions (underfitting). Variance is error from extreme sensitivity to small fluctuations in training data (overfitting).
Regularisation (L1/L2): Techniques used to prevent overfitting by adding a penalty to the loss function for models that get too complex.
L1 (Lasso) performs feature selection; L2 (Ridge) performs weight decay.

4. Data Splitting & Evaluation

Train / Validation / Test Splits:
• Train: Used to teach the model's weights.
• Validation: Used to tune hyperparameters and choose best model.
• Test: Hidden until the very end to evaluate final, real-world performance.
Cross-Validation: Splitting data into multiple rotating chunks to ensure the model evaluates well across the entire dataset, avoiding lucky splits.

Confusion Matrix: A table layout used to visualize the performance of a classification model, showing True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).
Precision vs. Recall:
• Precision: Out of predicted positives, how many were actually positive?
• Recall: Out of actual positives, how many did the model find?

Supervised Learning Basics

Core Goal: Learn an approximation function Y = f(X) where X are inputs and Y are target labels.
Regression: The target label is a continuous numeric value.
Examples: Estimating real estate market prices or tracking server CPU temperatures.
Classification: The target label is a discrete categorical bucket.
Examples: Flagging transaction fraud or sorting incoming emails into spam folders.

Classification vs Regression Scatter Plot

Bias, Variance & Model Tuning

Underfitting (High Bias): The model is too simple to capture underlying patterns.
Example: Predicting house price using only size, ignoring location. The model is too rigid.
Overfitting (High Variance): Model memorizes training noise and outliers.
Example: Fitting a high-degree polynomial that matches every single outlier, but fails on new houses.
Cross-Validation: Splitting data into k-folds to validate model performance.
Example: Splitting data into 5 groups, training on 4, testing on 1, and rotating 5 times to prevent bias.
Hyperparameter Tuning: Adjusting model options before training to control complexity.
Example: Limiting a decision tree's depth to 3 levels, or setting k = 5 in k-NN to smooth out predictions.

Regression Models In-Depth

📈 1. Linear Regression

Fits an optimal line matching target outputs while minimizing the residual sum of squares between inputs and predictions. Fits simple linear data trends.

Real Example: Estimating monthly retail revenue based on ad spend.
Real Example: Estimating crop yield based on rainfall index.

🛡️ 2. Regularization (Ridge & Lasso)

Adds a penalty "budget" to the coefficients to prevent overfitting. Ridge shrinks coefficients evenly. Lasso shrinks coefficients completely to zero, performing automatic feature selection.

Real Example: Customer lifetime value modeling across sparse tables.
Real Example: Regressions with highly multi-correlated features.

💡 Selection Guide: When to Choose Which?

📈 Linear Regression Choose if the feature-to-target relationships are simple and linear, you need maximum coefficient interpretability, or require a fast baseline model.

🛡️ Ridge Regression (L2) Choose when you have many highly correlated variables (multicollinearity) and want to keep all of them while preventing overfitting via weight shrinkage.

✂️ Lasso Regression (L1) Choose when you want automated feature selection, shrinking weights of irrelevant variables to exactly zero to create a sparse, highly interpretable model.

Ordinary Least Squares (OLS)

The math behind standard Linear Regression. We fit a linear function and optimize parameters by minimizing the Mean Squared Error (MSE) loss function:

Linear Equation: y = β₀ + β₁x₁ + ... + βₙxₙ
Cost Function: MSE = (1/n) * Σ (yᵢ - ŷᵢ)²

yᵢ is the true target, ŷᵢ is the model prediction.
Minimizing MSE yields the line of best fit (least squares residuals).

from sklearn.linear_model import LinearRegression model = LinearRegression(fit_intercept=True) model.fit(X_train, y_train)

OLS Linear Regression trend line and coordinates

🏠 Intuitive Real-World Example

Predict Housing Price (y, Dependent Variable) based on House Size (x₁, Independent Variable). The weight β₁ represents the price increase per additional sq. ft. (e.g., +$250/sq.ft.), while β₀ (intercept) is the base land cost.

Regularization Math

To prevent overfitting, we add a coefficient magnitude penalty term (α) to the OLS cost function:

Ridge (L2 Penalty) Cost:

J(w) = MSE + α * Σ (w_j)²

Lasso (L1 Penalty) Cost:

J(w) = MSE + α * Σ |w_j|

from sklearn.linear_model import Ridge, Lasso ridge = Ridge(alpha=1.0) lasso = Lasso(alpha=0.1)

Bias Variance tradeoff: Underfitting vs Good Fit vs Overfitting

📉 Regularization & Weight Shrinkage

If predicting housing price using Size, Bedrooms, and Wall Color: a high penalty α shrinks weights. Ridge (L2) shrinks weights evenly (retaining all features), while Lasso (L1) drives Wall Color's weight to exactly zero, automatedly discarding it.

Classification: Part 1

Logistic Regression: Fits features through a sigmoid activation curve to predict category probabilities between 0 and 1.
Example: Predicting if a client defaults on a loan (Probability 0 to 1).
Support Vector Machines (SVM): Solves optimal linear boundaries by maximizing margins separating data coordinate groups.
Example: Classifying handwritten letters by drawing widest boundary corridors.

from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC lr = LogisticRegression(C=1.0) # C is inverse regularization strength svm = SVC(kernel='rbf', C=1.0)

⚖️ Selection Guide: Logistic vs. SVM vs. k-NN

📈 Logistic Regression

• When to choose: Need fast training, highly interpretable coefficients, or explicitly require probabilistic scores (e.g. default probability).

🛡️ Support Vector Machines (SVM)

• When to choose: Non-linearly separable data (via kernels), high dimensionality (features > samples), or when maximum accuracy is the main goal.

📍 k-Nearest Neighbors (k-NN)

• When to choose: Small datasets, complex decision boundaries, and need an intuitive, instance-based model with zero training overhead.

💡 Pro Tip: Inverse Regularization strength C

C is the inverse of regularization strength (C = 1/λ).
• Smaller C: Stronger regularization; penalizes complex models to prevent overfitting (simpler decision boundary).
• Larger C: Weaker regularization; allows model weights to grow to fit training data tightly (risk of overfitting).

Distance-Based: k-NN

k-Nearest Neighbors (k-NN): Classifies points based on the majority labels of the 'k' closest points in coordinates space.
Details: Requires no mathematical training beforehand (lazy learner), but is computationally expensive for large tables.

from sklearn.neighbors import KNeighborsClassifier knn = KNeighborsClassifier(n_neighbors=5, p=2)

🏨 Intuitive Hotel Classification Example

To classify a new hotel based on Price per Night ($) and Distance to Beach (meters) under k=5: locate the 5 nearest hotels in coordinate space. If 4 are "Budget Hostel" and 1 is "Luxury Resort", the model classifies the new hotel as Budget Hostel by majority vote.

k-NN: Step-by-Step Example

Classifying a new fruit (Sweetness = 6, Crunchiness = 4) with k = 3 neighbors.

1. Measure Distance (Euclidean)

Formula: Distance = √((x₂ - x₁)² + (y₂ - y₁)²)

Fruit	Sweet (x)	Crunch (y)	Calculation	Distance
Apple A	7	7	√((7-6)² + (7-4)²) = √(1+9)	3.16
Apple B	8	5	√((8-6)² + (5-4)²) = √(4+1)	2.24
Orange A	3	3	√((3-6)² + (3-4)²) = √(9+1)	3.16
Orange B	7	2	√((7-6)² + (2-4)²) = √(1+4)	2.24
Orange C	6	2	√((6-6)² + (2-4)²) = √(0+4)	2.00

2. Find 3 Neighbors

Orange C (Dist: 2.00)
Apple B (Dist: 2.24)
Orange B (Dist: 2.24)

3. Vote for Class

Oranges: 2 votes (C, B)
Apples: 1 vote (B)

✅ Final Classification

The new fruit is classified as an Orange because it secured the majority of the votes (2 out of 3).

Tree-Based Ensembles

Decision Trees: A hierarchical sequence of logic cuts (rules) dividing features into homogeneous groups.
Random Forest (Bagging): Aggregates many trees trained on random sample subsets, voting in parallel to reduce variance.
XGBoost (Extreme Gradient Boosting): Sequentially fits trees where each new model corrects residuals (mistakes) of the previous tree.

from sklearn.ensemble import RandomForestClassifier from xgboost import XGBClassifier rf = RandomForestClassifier(n_estimators=100, max_depth=8) xgb = XGBClassifier(n_estimators=100, learning_rate=0.1)

Decision Tree, Random Forest and XGBoost Comparison

🌲 Selection Guide: Tree Algorithms

Decision Trees: Best for quick, simple baselines where explainability is crucial. Highly sensitive to minor dataset shifts.
Random Forest (Bagging): Best for general tabular data. Trains independent trees in parallel on bootstrap samples, reducing variance out-of-the-box.
XGBoost (Boosting): Best for winning predictions. Fits sequential trees correcting residual errors of past models to minimize bias.

🛠️ Crucial Tuning Parameters

• Max Depth: Limits how deep trees grow. Low values prevent overfitting; high values capture complex structures.
• n_estimators & LR: Random Forest is robust to high counts; XGBoost requires balancing trees with a smaller learning rate.

💡 Split Example (Elaborated Decision Tree Branches)

• Root: Credit Score > 650? ├─ YES ➔ DTI < 40%? │ ├─ YES ➔ Employed == True? ➔ Approved (96% prob, Low Risk) │ └─ NO ➔ Income > $80k? ➔ Approved (80% prob, High Interest) └─ NO ➔ Collateral == True? ├─ YES ➔ Approved (70% prob, Secured Loan) └─ NO ➔ Rejected (95% prob, Extreme Risk)

Supervised Learning: Case Studies

Case Study 1 📵

Telecom Customer Churn

Problem: Predict user subscription cancellations based on usage records.
Type: Classification.
Models: Logistic Regression, Random Forest, XGBoost.

Case Study 2 🏠

Real Estate Pricing

Problem: Estimate home values based on size, zip codes, crime rates.
Type: Regression.
Models: Ridge/Lasso, Random Forest Regressor.

Case Study 3 ✉️🛡️

Email Spam Filter

Problem: Classify incoming emails as legitimate or spam category.
Type: Classification.
Models: Naive Bayes, Decision Trees, SVM.

Supervised Quiz: Classification vs. Regression

1. Predict standard stock price movements tomorrow (e.g. $150.25).

2. Identify if an incoming email is a phishing attempt.

3. Estimate delivery ETA duration of a food courier.

4. Diagnose whether medical scans show malignant or benign cell tumors.

Unsupervised Learning Basics

What is Unsupervised Learning?

Analyzing unstructured, unlabeled data patterns to extract inherent shapes, dimensions, and segment groups without teacher supervision.

Core Sub-Types

Clustering: Grouping spatial coordinates based on similarity metrics.
Dimensionality Reduction: Compressing input dimensions while keeping variance.
Anomaly Detection: Isolating rare outliers.

📊

Unlabeled Data

Only Features (X)

➡️

🧠

Unsupervised Engine

Find Shapes & Compression

➡️

🎨 Clustering (Groups)

🗜️ Dim. Reduction (Axes)

🚨 Anomaly Detection (Outliers)

Clustering Algorithms

K-Means: Groups data points into k spherical partitions by iteratively relocating centroids to match local means.
Hierarchical Clustering: Builds nested tree branches (agglomerative) to connect data coordinates without predefining cluster counts.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Groups dense coordinate neighborhoods, discovering arbitrary cluster shapes and isolating sparse noise.

from sklearn.cluster import KMeans, DBSCAN kmeans = KMeans(n_clusters=3) dbscan = DBSCAN(eps=0.5, min_samples=5)

🔵 Selection Guide: Clustering Algorithms

📍 K-Means (Centroid-Based)

• When to choose: Spherical clusters, similar sizes, and need maximum speed/scalability on large datasets.
• Caveat: Sensitive to feature scaling; requires predefining number of clusters k.

🌿 Hierarchical (Tree-Based)

• When to choose: Need taxonomic nested hierarchies (dendrograms) or deterministic results on small datasets.
• Caveat: High computational complexity (O(N³)), making it sluggish for datasets over 10,000 samples.

🌀 DBSCAN (Density-Based)

• When to choose: Arbitrary shapes (rings, loops), data has noise/outliers to filter, and cluster count is unknown.
• Caveat: Fails on datasets with highly varying densities; tuning search radius Eps is critical.

💡 Clustering Examples

• K-Means: Customer segmentation for targeted marketing campaigns.
• Hierarchical: Building evolutionary trees of animal species (dendrograms).
• DBSCAN: Mapping city crime or traffic hotspots while filtering outlier noise.

Mechanics of Clustering: How They Work

K-Means Working

Centroid-Based Partitioning:

Initialize: Randomly place K center coordinates (centroids).
Assign: Map every data point to its closest centroid using Euclidean distance.
Update: Shift each centroid to the mean coordinate of all points assigned to it.
Iterate: Repeat steps 2-3 until centroids stop shifting.

📍

1. Assign

➡️

🔄

2. Update

➡️

🎯

3. Iterate

Hierarchical Working

Bottom-Up Agglomerative Trees:

Initialize: Treat every single coordinate point as its own distinct cluster.
Measure: Calculate distance between all clusters using a linkage metric.
Merge: Group the two closest clusters into a single parent cluster.
Iterate: Repeat steps 2-3 until only one root cluster remains, forming a dendrogram.

* Note: Divisive (top-down) is conceptually opposite but mathematically asymmetric to Agglomerative. Agglomerative merges local pairs; Divisive splits global structures.

🌱

1. Pairs

➡️

🌿

2. Merge

➡️

🌳

3. Tree

DBSCAN Working

Density-Based Connectivity:

Scan Neighbors: For each point, count how many coordinates lie within radius (ε).
Core Points: If points count ≥ min_samples, mark it as a "Core" point and start a new cluster.
Expand Border: Include neighbors within ε of core points; mark isolated neighbors as "Border".
Isolate Noise: Any remaining coordinates not reachable from core points are labeled as "Noise" (outliers).

🌐

1. Scan

➡️

🔗

2. Link

➡️

🚫

3. Noise

K-Means: Step-by-Step Example

Grouping customers into K = 2 clusters based on Age and Spending Score (1-10). Initial centroids: C1 = (20,3), C2 = (40,8).

1. Distance Assignment Step

Customer	Age (x)	Spend (y)	Dist to C1 (20,3)	Dist to C2 (40,8)	Cluster
User A	22	4	2.24	18.44	C1
User B	28	2	8.06	13.42	C1
User C	45	9	25.71	5.10	C2
User D	38	7	18.44	2.24	C2

2. Update Centroid Center Step

New C1 Center: Average of User A(22,4) & B(28,2) → ((22+28)/2, (4+2)/2) = (25, 3)
New C2 Center: Average of User C(45,9) & D(38,7) → ((45+38)/2, (9+7)/2) = (41.5, 8)

🔄 3. Repeat Until Convergence

Steps 1 & 2 are repeated with updated centroids. The process stops when cluster assignments freeze and centroids no longer shift coordinate positions.

Dimensionality Reduction

PCA (Principal Component Analysis): Projects high-dimensional data orthogonally to new directions capturing maximum variance.
t-SNE (t-Distributed Stochastic Neighbor Embedding) / UMAP (Uniform Manifold Approximation and Projection): Non-linear manifold mapping that preserves local neighborhoods to visualize distributions in 2D coordinate space.

from sklearn.decomposition import PCA from sklearn.manifold import TSNE pca = PCA(n_components=2) tsne = TSNE(n_components=2, perplexity=30)

📊 Selection Guide: PCA vs. t-SNE / UMAP

📈 PCA (Linear Projection)

• When to choose: Preprocessing features for other models, noise reduction, or preserving global structural patterns.
• Projection: Supports out-of-sample projection (`pca.transform(X_new)`) for new data points.

🗺️ t-SNE & UMAP (Non-Linear Mapping)

• When to choose: Visualizing high-dimensional clusters in 2D or 3D coordinate space.
• Projection: Does *not* support projecting new points; requires re-running the entire dataset.

💡 Reduction Examples

• PCA: Compressing 100+ user survey answers into 2 principal components to feed into a regression model.
• t-SNE / UMAP: Visualizing high-dimensional single-cell genetic sequences in a 2D scatter plot.

Unsupervised Quiz: Clustering or Dimensionality Reduction?

1. Group coordinate locations of delivery drops to establish local sorting hubs.

2. Compress 50 columns from a buyer survey into 3 summary metrics for a 2D plot.

3. Segment online news articles into topic folders to help readers browse.

Unsupervised Learning Case Studies

Case Study 1 🛒

Customer Segmentation

Problem: Group buyers based on shopping volumes and session times.
Type: Clustering.
Models: K-Means, Hierarchical Clustering.

Case Study 2 🧬

Genomics Visualization

Problem: Map expression distributions of 20k genes in 2D plots.
Type: Dimensionality Reduction.
Models: t-SNE, PCA.

Case Study 3 💳🛡️

Credit Anomaly Detection

Problem: Flag highly rare bank credit card transaction anomalies.
Type: Outlier / Anomaly Detection.
Models: Isolation Forest, One-class SVM.

Reinforcement Learning Basics

Concept: Teaching model behaviors using feedback loops based on action trials and environmental rewards.
Agent: The core decision-making AI engine.
Environment: The interactive space surrounding the agent.
State: Current environmental configurations.
Action: Movement selection performed by the agent.
Reward: positive or negative numeric feedback score.

Interactive: RL Agent Learning Journey

Exploration Blind Trial: The robot moves forward, enters a hazard zone (fire), and immediately receives a large negative reward (-100). The agent updates its memory weights to avoid this action in similar grid states in future epochs.

Exploitation Path Corrected: After multiple iterations, the agent learns the barrier locations, navigates around hazard areas, reaches the battery charger goal, and earns a positive reward (+100).

RL Core Concepts & Algorithms

Q-Learning (Value-Based)

Maintains a lookup index (Q-table) mapping expected cumulative rewards for actions selected in current states.

Use Case: Industrial robot vacuum navigation.

STATES

State/Action	⬅️ Left	⬆️ Up	➡️ Right
S1 (Start)	0.0	-10.0	+0.5
S2 (Hazard)	+1.2	0.0	-100.0

import gymnasium as gym import numpy as np env = gym.make('FrozenLake-v1') # Q-table: states x actions matrix q_table = np.zeros([env.observation_space.n, env.action_space.n])

Policy Gradients (Policy-Based)

Directly models and scales probability distributions of actions without saving a mid-layer state value lookup matrix.

Use Case: Continuous throttle controls for quadcopters.

Input State

[x, y, vx, vy]

➡️

NN Policy π(a│s)

Softmax Outputs

➡️

Left:12%

Up:85%

Right:3%

from stable_baselines3 import PPO import gymnasium as gym env = gym.make('CartPole-v1') model = PPO('MlpPolicy', env, verbose=0) model.learn(total_timesteps=5000)

Reinforcement Learning Case Studies

Case Study 1 ♞

Grandmaster Chess/Go AI

Problem: Master complex game strategies to beat humans.
Type: Reinforcement Learning.
Models: Monte Carlo Tree Search, Deep Q-Networks.

Case Study 2 🦾

Robotic Arm Control

Problem: Grasp moving objects without damage.
Type: Continuous RL.
Models: PPO, Deep Deterministic Policy Gradients.

Case Study 3 🎯💰

Ad Bidding Optimization

Problem: Select user ad impressions to maximize clicks.
Type: Multi-Armed Bandit RL.
Models: Thompson Sampling, UCB.

Choosing the Right Model: Decision Matrix

Supervised (Predictive)

OLS Linear Regression Fast: Simple linear relationships, maximum interpretability.
Ridge & Lasso Fast: High-dimensional inputs, sparse data tables, multicollinearity.
Logistic Regression Fast: Baseline binary classification splits.
SVM Moderate: Complex margins, high-dimensional text/feature matrices.
Random Forest / XGB Heavy: Complex structured tables, non-linear relationships (highest accuracy).

Unsupervised (Discovery)

K-Means Fast: Evenly sized, spherical, distinct customer clusters.
Hierarchical Moderate: Tree-like taxonomic relationships (e.g. biology).
DBSCAN Moderate: Dense clusters of arbitrary shapes with noise anomaly isolation.
PCA Fast: Linear dimension reduction to speed up downstream models.
t-SNE Heavy: Mapping complex non-linear manifolds strictly for 2D/3D visualization.

Reinforcement (Behavioral)

Multi-Armed Bandits Fast: Static states, balancing real-time web testing (explore vs. exploit).
Q-Learning Moderate: Low-dimensional discrete states and discrete action grids.
Policy Gradients (PPO) Heavy: Continuous control spaces (drones, robotics, complex behaviors).

How Many Models Exist?

There is no fixed count, but models group into 5 core families:

📉 Linear Models: Fit flat linear planes (e.g. Linear/Logistic, Ridge/Lasso, and Polynomial Regression).
🌲 Tree-based Ensembles: Cut spaces into nested step rules (e.g. Decision Trees, Random Forests, XGBoost).
📏 Distance & Kernel Models: Proximity distance & coordinate projections (e.g. k-NN, Kernel SVM, K-Means, DBSCAN).
🎲 Probabilistic Models: Rely on bayesian probability weights (e.g. Naive Bayes, GMM).
🕸️ Neural Networks: Deep nodes layers mapping non-linear inputs to outputs.

* Key Contrast: While Linear Models assume flat, straight-line relationships (with Polynomial mapping non-linear relations using linear solvers), Tree Ensembles, Kernels, and Neural Networks are inherently Non-Linear—allowing them to fit complex curved boundaries.

Model Directory: 25 Core Algorithms Covered

Paradigm	Model / Algorithm	Slide Reference	Type
Supervised (Regression)	1. Ordinary Least Squares (OLS) Linear Regression	Slide 7	Linear
Supervised (Regression)	2. Ridge Regression (L2 Penalty)	Slide 8	Linear
Supervised (Regression)	3. Lasso Regression (L1 Penalty)	Slide 8	Linear
Supervised (Regression)	4. Polynomial Regression (Bridge Model)	Slide 25 (Mentioned)	Non-Linear features / Linear params
Supervised (Classification)	5. Logistic Regression	Slide 9	Linear
Supervised (Classification)	6. Support Vector Classifier (Linear SVM)	Slide 9	Linear
Supervised (Classification)	7. Kernel SVM (RBF / Polynomial Kernels)	Slide 9	Non-Linear
Supervised (Classification)	8. k-Nearest Neighbors (k-NN)	Slide 10	Non-Linear
Tree Ensembles (Reg/Clas)	9. Decision Trees	Slide 11	Non-Linear
Tree Ensembles (Reg/Clas)	10. Random Forest (Bagging)	Slide 11	Non-Linear
Tree Ensembles (Reg/Clas)	11. XGBoost, LightGBM, CatBoost (Boosting)	Slide 11	Non-Linear
Unsupervised (Clustering)	12. K-Means Clustering	Slide 15	Non-Linear
Unsupervised (Clustering)	13. Hierarchical Clustering (Agglomerative)	Slide 15	Non-Linear
Unsupervised (Clustering)	14. DBSCAN (Density-Based)	Slide 15	Non-Linear
Unsupervised (Dim. Reduction)	15. Principal Component Analysis (PCA)	Slide 17	Linear
Unsupervised (Dim. Reduction)	16. t-SNE & UMAP (Manifold Learning)	Slide 17	Non-Linear
Unsupervised (Anomaly)	17. Isolation Forest	Slide 19	Non-Linear
Unsupervised (Anomaly)	18. One-Class SVM	Slide 19	Non-Linear
Probabilistic Models	19. Naive Bayes	Slide 12	Non-Linear
Probabilistic Models	20. Gaussian Mixture Models (GMM)	Slide 25	Non-Linear
Reinforcement Learning	21. Multi-Armed Bandits (Thompson / UCB)	Slide 23	Behavioral / Policy
Reinforcement Learning	22. Q-Learning (Value-Based)	Slide 22	Behavioral / Value-Based
Reinforcement Learning	23. Policy Gradients / PPO	Slide 22	Behavioral / Policy-Based
Reinforcement Learning	24. Deep Q-Networks (DQN) & DDPG	Slide 23	Behavioral / Deep RL
Deep Learning	25. Neural Networks (MLPs, CNNs, Transformers)	Slide 25	Non-Linear

The 6-Step ML Workflow

Data Collection: Querying raw database storage tables or APIs.
Preprocessing: Cleaning outliers, scaling values, feature engineering.
Model Choice: Picking the target mapping algorithm class.
Training: Fitting candidate model weights on data splits.
Evaluation: Validating output accuracy metrics on test folds.
Deployment: Packaging final models into live inference APIs.

Model Deployment: Model Packaging & Export

📦 Packaging Models for Production

Before a trained machine learning model can serve live predictions, its in-memory weights must be serialized into a persistent, portable file artifact.

Lasso / Ridge / Random Forest: Serialized using Joblib or Pickle.
Neural Networks / Multi-platform: Compiled into ONNX format.

📓

1. Train & Validate

➡️

💾

2. Save Artifact

➡️

📦

3. Deploy Model

import joblib # 1. Train machine learning model model.fit(X_train, y_train) # 2. Serialize model as file artifact joblib.dump(model, 'model.joblib')

Model Deployment: Inference API Deployment

🚀 Serving Real-Time Predictions

Deploying models means exposing them via a Python web framework (e.g., FastAPI or Flask) as a REST API endpoint for consumption by external applications.

Load: The serialized model is loaded in memory on server startup.
Expose: API endpoints receive incoming user feature inputs.
Predict: Perform real-time inference and return output scores.

💻

Client App

POST /predict

➡️ Req ⬅️ Res

⚡

Flask App

API Endpoint

↔️

💾

Loaded Model

In Memory

from flask import Flask, request, jsonify import joblib app = Flask(__name__) model = joblib.load('model.joblib') @app.route('/predict', methods=['POST']) def predict(): features = request.json['features'] prediction = model.predict([features]) return jsonify({'prediction': list(prediction)})

Mapping Tasks & Tooling

Pipeline Alignment ⚙️

Regression & Classification represent core Model Choice, Training, and Evaluation blocks.
Clustering & PCA fit directly into Preprocessing (mapping datasets to compressed arrays prior to training).

Key Tool Ecosystems 🛠️

Pandas / NumPy DataFrames Arrays
Scikit-Learn Classical ML
TensorFlow / PyTorch Deep Learning
MLflow MLOps

Key Takeaways & Wrap-Up

Define Problem First: Always map your requirements to labels (Supervised), features structure (Unsupervised), or states interaction (Reinforcement).
Prioritize Baseline Models: Try simple linear weights or single decision splits before compiling deep neural networks or complex boosting stacks.
Iterate on Pipeline Data: Most modeling errors stem from poor feature preprocessing, not hyperparameter tuning. Clean and scale your raw inputs carefully.

Visual Cheat Sheet Summary

Summary Quiz: Paradigm Matchmaker

1. Train a self-driving car to steer and avoid road cones via feedback rewards.

2. Predict the salary of a new job listing based on experience, role, and location.

3. Find hidden, fraudulent cohorts inside banking transactions without labels.

Audience Q&A

Ask Me Anything!