Thursday, January 14, 2021

Google Cloud Data Engineer - My Quick Notes 2 (ML)

 


https://developers.google.com/machine-learning/glossary

------------------------------------------------------------------------------

ML Categories

Unsupervised Learning

Supervised Learning

Reinforcement Learning

------------------------------------------------------------------------------

** UnSupervised Learning

Draw inference from data

Previously undetected patterns

Example - 

Clustering (Finding groups of similar entities in a data set)

Anomaly Detection

Principal component analysis - get the most important attributes

** Supervised Learning

Learn from examples

Goal is to predict category or value

Example

Classifying tumers from images - (Classification)

Predicting housing prices - (Regression)

Identify fraudulent credit card transactions 

**  Reinforcement Learning (not explored in DE exam much)

Learn from environment

Maximize reward

Does not require examples

Instead it uses Exploration from the environment and expolitation of data points

Example

Agent taking actions in environment and receiving rewards

------------------------------------------------------------------------------

2 approaches to ML

Symbolic Artificial Intelligence (2006-2009)

Neural networks and deep learning (built on neural networks)

------------------------------------------------------------------------------

**  Symbolic Artificial Intelligence

Symbols represent entities and attributes

Manipulate symbols to make inferences

Models of Reasoning

Logic

Cognitive science

Features

Say, to predict re-admittance of a patient to hospital

Use, length of stay, type of operation, Age etc

Symbolic ML Algorithms

Decision Trees 

Ask questions --> dig further based on answers, with more questions

Set of decision points , and Terminal node is the answer/Classification

Random Forest

If there are multiple decision trees built with different features - popular

Naive Bayes

Conditional probability

Support Vector Machines (SVMs)

represent entities as points in space

Similar entities are close in space

Dissimilar entities separated by gap - this algo find the gap

K Nearest Neighbors  

- To Categorize

- Finding ways to measure distance b/w objects, closer once are same categories

**  Neural networks and deep learning

Neuron line abstraction

Inputs are numbers (x) - featuers or output of another Neuron 

Weights assign importance to inputs (W)

x1*W1 + x2*W2 + x3*W3 --(non-linear fuction aka Neuron)--> Output

Non-linear fuction is called Activation Function

Sigmoid

TanH

ReLU

**** We train the model to adjust the weight to get the desired output ****

Layers - can be any number of (simple one has 3)

Input Layer

Hidden Layer

Output Layer

Deep Learning (more than 3 layers)

Challenging to learn weights

Backpropagation algo is used to adjust the weights 

- takes in to account the size of the error, 

& the slope towards the right/correct answer, ideal point


==================================================================

Entity & Attributes

Features

Label

ML Uses featuers to predict Label


Feature Engineering

Manipulate features to improve the quality of the ML model

Identify useful features (original or transformed value)

Derived features


** Ways to do feature engineering

Transform existing features (cleanup etc)

Map numeric values to a scale of 0 to 1

Bucketing - to reduce # of values (say 1-100 to 10 buckets)

Feature-cross - cartition product of 2 or more features

say -weights(light medium heavy) x color(blue, green, red) -9 combos comes

helps with non-linear relationships to capture

Binary featues

is_red, is_blue like that

Decompose value parts

From date - extract day, month, year

From Address - extract street etc

One-Hot Encoding

Map value to a single bit in a binary array

each position represents a possible value(liek Red - 100, green 001, etc)

                       used to represent categorical features in deep learning models. 

Normalization

Convert numeric value to a standard (0 to 1 or  -1 to +1)

0 to 1 is called Scaling (divide feature value with max value)


Model Building

Define problem

Collect Data

Define Evaluation method

Prepare the data (iteratively)

Split the data in to Training, Validating & Test

Execute the Algorithm on data to build the model

Validate the model (tune the model)

adjust the hyper parameters (not learned from the data)

# of layers in NN, decision tree depth allowed, max trees in RForest etc 

[params are learned by algo from data]

Test model

[Training -> Model -> Validation -> Tune model -> Training; then test once all done]


Evaluating Model

Commonly used metrics

Accuracy (classification problems)

Precision  (classification problems)

Recall  (classification problems)

Mean Squired Error (reggression problem)

*** Never test with training data

Confusion Metrix - Actual x Predicted

Accuracy - # of correctly predicted data points - (TP+TN)/(TP + FP + TN + FN)

Precision - % of positive data points ( TP/(TP+FP)) 

Recall -  % of actual positive data points identified TP/(TP+FN)


===============================================================

Deep Learning

Gradient Descent 

- U shaped graph in first quadrant.

- x-axis Weight

- y-axis Loss

- AIM:  minimize the total loss

- Train the model to make initial weight to Optimal weight

- Gradient (slope) - which dir to go, how fast to go

- "Learning rate"(hyper param) determines the incremental step size

- here weight is the parameter the model leans

- "Hyper parameters" we adjust to get the optimal "parameter" which is weight

Types:

Batch gradient discent

Loss is calculated over entire data set

Slow on large data sets

Stochastic Gradient Descent

For large datasets (so in Deep Learning)

Weights are updated after each instance (not after entire dataset)

Can adjust the weight with each example

Training instances are randomly sorted (Stochastic)

Random walks avoids getting stuck

Mini-batch gradient descent

B.w batch and stochastic

How to calculate the gradient? Solution is BackPropagation

BackPropagation

Compute gradient of mapping function over an input-output pair

Calculate partial derivative of loss function relative to each weight

More effifient than naive calculation

.. add more notes


------------------------------------------------

Model Troubleshooting

------------------------------------------------

Underfitting

Model performs poorly on training and validation data

Ways to correct underfitting

Increase the complexity of the model

add additional layers in NN

increase # of decision trees allowed in Random Forest

increate the max depth in decision trees

Increase the Training Time or epochs

#epoch- number of iterations of the entire training dataset the ML algo completed

Overfitting

Model performs well on training data but poorly on validation data

Correction options

Regularization - which limits the info captured

To avoid outliers in the data over-influence the model

Bias - Variance Tradeoff

        https://towardsdatascience.com/bias-and-variance-in-linear-models-e772546e0c30

These are the natural characteristics of model, but need trade-offs

Bias Error

Result of missing relationships b/w features & models

means, we miss some important info as a feature?

Bcoz, we did not sufficiently generalize from training data

Variance Error: 

Due to sensitivity in the small fluctuations in the training data

Small changes in the input can cause large changes in the output

variance is the difference among a set of predictions

Bias and Unfairness issue:

Fairness

Anti-classification -: Protected attribued not used in the model (Gender)

Classification parity:

Predictive peformance are equal across groups

Calibration:

Outcomes are indepedent of protected attributes

==============================================

quick additional notes

Vision AI - Transfer Learning (use one for another set of probs)

Collaboration filtering - recommendations

          Cloud Run - if model is stateless (to deploy models) 


GPU - High paralle processing, ALU, Matrix multiplication (need NVDIA drivers)

TPU - Application Specific Integration circuit (ASIC) - for tensorflow models

             Cost less than GPU

 

https://docs.google.com/forms/d/e/1FAIpQLSfkWEzBCP0wQ09ZuFm7G2_4qtkYbfmk_0getojdnPdCYmq37Q/viewform


https://cognizant.udemy.com/course/google-cloud-professional-data-engineer-get-certified/learn/quiz/4945080#overview

 



No comments:

Post a Comment