https://developers.google.com/machine-learning/glossary
------------------------------------------------------------------------------
ML Categories
Unsupervised Learning
Supervised Learning
Reinforcement Learning
------------------------------------------------------------------------------
** UnSupervised Learning
Draw inference from data
Previously undetected patterns
Example -
Clustering (Finding groups of similar entities in a data set)
Anomaly Detection
Principal component analysis - get the most important attributes
** Supervised Learning
Learn from examples
Goal is to predict category or value
Example
Classifying tumers from images - (Classification)
Predicting housing prices - (Regression)
Identify fraudulent credit card transactions
** Reinforcement Learning (not explored in DE exam much)
Learn from environment
Maximize reward
Does not require examples
Instead it uses Exploration from the environment and expolitation of data points
Example
Agent taking actions in environment and receiving rewards
------------------------------------------------------------------------------
2 approaches to ML
Symbolic Artificial Intelligence (2006-2009)
Neural networks and deep learning (built on neural networks)
------------------------------------------------------------------------------
** Symbolic Artificial Intelligence
Symbols represent entities and attributes
Manipulate symbols to make inferences
Models of Reasoning
Logic
Cognitive science
Features
Say, to predict re-admittance of a patient to hospital
Use, length of stay, type of operation, Age etc
Symbolic ML Algorithms
Decision Trees
Ask questions --> dig further based on answers, with more questions
Set of decision points , and Terminal node is the answer/Classification
Random Forest
If there are multiple decision trees built with different features - popular
Naive Bayes
Conditional probability
Support Vector Machines (SVMs)
represent entities as points in space
Similar entities are close in space
Dissimilar entities separated by gap - this algo find the gap
K Nearest Neighbors
- To Categorize
- Finding ways to measure distance b/w objects, closer once are same categories
** Neural networks and deep learning
Neuron line abstraction
Inputs are numbers (x) - featuers or output of another Neuron
Weights assign importance to inputs (W)
x1*W1 + x2*W2 + x3*W3 --(non-linear fuction aka Neuron)--> Output
Non-linear fuction is called Activation Function
Sigmoid
TanH
ReLU
**** We train the model to adjust the weight to get the desired output ****
Layers - can be any number of (simple one has 3)
Input Layer
Hidden Layer
Output Layer
Deep Learning (more than 3 layers)
Challenging to learn weights
Backpropagation algo is used to adjust the weights
- takes in to account the size of the error,
& the slope towards the right/correct answer, ideal point
==================================================================
Entity & Attributes
Features
Label
ML Uses featuers to predict Label
Feature Engineering
Manipulate features to improve the quality of the ML model
Identify useful features (original or transformed value)
Derived features
** Ways to do feature engineering
Transform existing features (cleanup etc)
Map numeric values to a scale of 0 to 1
Bucketing - to reduce # of values (say 1-100 to 10 buckets)
Feature-cross - cartition product of 2 or more features
say -weights(light medium heavy) x color(blue, green, red) -9 combos comes
helps with non-linear relationships to capture
Binary featues
is_red, is_blue like that
Decompose value parts
From date - extract day, month, year
From Address - extract street etc
One-Hot Encoding
Map value to a single bit in a binary array
each position represents a possible value(liek Red - 100, green 001, etc)
used to represent categorical features in deep learning models.
Normalization
Convert numeric value to a standard (0 to 1 or -1 to +1)
0 to 1 is called Scaling (divide feature value with max value)
Model Building
Define problem
Collect Data
Define Evaluation method
Prepare the data (iteratively)
Split the data in to Training, Validating & Test
Execute the Algorithm on data to build the model
Validate the model (tune the model)
adjust the hyper parameters (not learned from the data)
# of layers in NN, decision tree depth allowed, max trees in RForest etc
[params are learned by algo from data]
Test model
[Training -> Model -> Validation -> Tune model -> Training; then test once all done]
Evaluating Model
Commonly used metrics
Accuracy (classification problems)
Precision (classification problems)
Recall (classification problems)
Mean Squired Error (reggression problem)
*** Never test with training data
Confusion Metrix - Actual x Predicted
Accuracy - # of correctly predicted data points - (TP+TN)/(TP + FP + TN + FN)
Precision - % of positive data points ( TP/(TP+FP))
Recall - % of actual positive data points identified TP/(TP+FN)
===============================================================
Deep Learning
Gradient Descent
- U shaped graph in first quadrant.
- x-axis Weight
- y-axis Loss
- AIM: minimize the total loss
- Train the model to make initial weight to Optimal weight
- Gradient (slope) - which dir to go, how fast to go
- "Learning rate"(hyper param) determines the incremental step size
- here weight is the parameter the model leans
- "Hyper parameters" we adjust to get the optimal "parameter" which is weight
Types:
Batch gradient discent
Loss is calculated over entire data set
Slow on large data sets
Stochastic Gradient Descent
For large datasets (so in Deep Learning)
Weights are updated after each instance (not after entire dataset)
Can adjust the weight with each example
Training instances are randomly sorted (Stochastic)
Random walks avoids getting stuck
Mini-batch gradient descent
B.w batch and stochastic
How to calculate the gradient? Solution is BackPropagation
BackPropagation
Compute gradient of mapping function over an input-output pair
Calculate partial derivative of loss function relative to each weight
More effifient than naive calculation
.. add more notes
------------------------------------------------
Model Troubleshooting
------------------------------------------------
Underfitting
Model performs poorly on training and validation data
Ways to correct underfitting
Increase the complexity of the model
add additional layers in NN
increase # of decision trees allowed in Random Forest
increate the max depth in decision trees
Increase the Training Time or epochs
#epoch- number of iterations of the entire training dataset the ML algo completed
Overfitting
Model performs well on training data but poorly on validation data
Correction options
Regularization - which limits the info captured
To avoid outliers in the data over-influence the model
Bias - Variance Tradeoff
https://towardsdatascience.com/bias-and-variance-in-linear-models-e772546e0c30
These are the natural characteristics of model, but need trade-offs
Bias Error
Result of missing relationships b/w features & models
means, we miss some important info as a feature?
Bcoz, we did not sufficiently generalize from training data
Variance Error:
Due to sensitivity in the small fluctuations in the training data
Small changes in the input can cause large changes in the output
variance is the difference among a set of predictions
Bias and Unfairness issue:
Fairness
Anti-classification -: Protected attribued not used in the model (Gender)
Classification parity:
Predictive peformance are equal across groups
Calibration:
Outcomes are indepedent of protected attributes
==============================================
quick additional notes
Vision AI - Transfer Learning (use one for another set of probs)
Collaboration filtering - recommendations
Cloud Run - if model is stateless (to deploy models)
GPU - High paralle processing, ALU, Matrix multiplication (need NVDIA drivers)
TPU - Application Specific Integration circuit (ASIC) - for tensorflow models
Cost less than GPU
https://docs.google.com/forms/d/e/1FAIpQLSfkWEzBCP0wQ09ZuFm7G2_4qtkYbfmk_0getojdnPdCYmq37Q/viewform
No comments:
Post a Comment