Sunday, January 17, 2021

Angular JS - notes

 

####### Introduction to Angular ######

##Angular (ng) High-level info (Angular 7)

Angular 2+ (‘Angular’) is a Typescript-based opensource front-end web application platform (originally from Google)

TypeScript (.ts) => String Typing, Decorators (Annotations)

--> ES6 (ECMAScript) => Classes, Modules, Arrow Functions

TypeScript Compiler generates Java Script (human-readable) --> Compile in to ES5 or ES6 

SPA - Single Page Application 

- AJAX & HTML5 - HTML Fragments (mini--views)

Clean separation of code that renders UI & the code that does the application Logic

Dependency Ingection - loose coupling b/w components and services

Automatic change detection (UI gets reflected automatically)

Comes with Rx.js - subscriptionbased processing of asynchronous data  - eliminated callback hell

Angular CLI - The scaffolding and deployment tool - spares developers from writing the boilerplate code and configuration scripts.

##Responsive Web Design (RWD)

- Web desgin approach - optimal viewing experiance on wide range of devices

- Responsive theme's layout adjusts to your devices screen size

##WorkSpace Setup


Node-v10.x https://nodejs.org/en/download/

Visual Studio Code

Angular CLI - npm install -g @angular/cli


##First Project (ng way)


#1 Open Terminal window --> go to your folder where you want to create the project

--> ng new <<your-project-name-here>>

#2 Launch the Angular application by running the command

--> ng serve --o

#3 Launches the index.html in a browser @ http://localhost:4200 (port might vary)

##Deployment and other info - https://angular.io/guide/deployment


##Build and Deploy as a JS

Ng serve

• In-memory compilation running on local server

• File changes trigger reload for iterative development

Ng build

• Creates output folder ‘dist’

• Consolidates app into a few js files

• Assets folder replicated

• Must change base href for deployment environment


##First Project (npm way)

Go to desired folder in Terminal and run  --> npm init -y

look at files, especially package.json

Add dependencies – Angular, SystemJS, Live-server, TypeScript compiler

Run the install command to install the above dependencies --> npm install

Copy the following files 

                    - SystemJS Config file, index.html, main.ts, app.module.ts and app.component.ts

Launch the Angular application by running the command

Launch the index.html in a browser @ http://localhost:4200/index.html (port might vary)

## Angular Project Structure 

#Angular.json


"outputPath" is the location where the ng build command places the condensed js files for the entire app

index.html is the single page in which the Angular App is injected

main.ts is the “main” Angular entry point

polyfills.ts contains browser compatibility polyfills and application polyfills.

styles.css is where global styles can be placed. Any CSS rules you place here are injected into the DOM in a <style></style> tag

"assets" array describes the location of static file assets

#Index.html

#main.ts --> calls bootstrap function

#app.module.ts --> being bootstrapped - to app component

#app.component.ts --> @Component decorator

#app.component.html --> 

################ Basic Concepts and Binding ##########


• TS / JS Primer
• Types
• Shapes
• Spread and Rest Operators
• Classes and Interfaces
• Decorators
• Arrow Functions

The Data Binding process in Angular 7
• Interpolation
• Property
• Event
• Two-Way
## Types (Strict Type checking)

function add (a:number, b:number){
  return a+b;
}
add(‘5’,6);//Compilation error
add(1, 3);//works
#Types: Boolean, number, string, arrays [], object literals{}, undefined, null, enum, any and void
    let hostname: string = "Sree";
let list: number[] = [1,2,3];
enum Color {Red, Blue, Green, Black};
let c: Color = Color.Black;
#Types inferred if no type is given
let a = 123;
## Spread Operators (spreading array into positional arguments)
var list = [1, 2, 3];
    list = [...list, 4, 5, 6];
    console.log(list); // [1,2,3,4,5,6]
## Destructiring 
var [x, y, ...remaining] = [1, 2, 3, 4];
console.log(x, y, remaining); // 1,2,[3,4]
## Rest Operators (like varargs in java) - accept multiple arguments in your function and get them as an array.
function fnTakeItAll(first, second, ...allOthers) {
console.log(allOthers);
}
fnTakeItAll('cat', 'mat'); // []
fnTakeItAll('cat', 'mat', 'bat', 'fat'); // ['bat','fat']
## var vs let

#var - Variables in JavaScript are function scoped
var foo = 123;
if (true) {
var foo = 456;
}
console.log(foo); // 456
#let - variables with true block scope
let foo = 123;
if (true) {
let foo = 456;
}
console.log(foo); // 123
 
## Union of types (can be al alternative to Inheritance)
export enum Color {RED = ‘red’, BLUE = ‘blue’, WHITE = ‘white’}
we could use union types and get similar benefits in a much shorter fashion:
export type Color = 'red' | 'white' | 'blue';
export type Optional<T> = T | undefined;
let user: Optional<User>;
export type AuthAction = LoginAction
                          | LoginSuccessfulAction
                          | LoginErrorAction
                          | LogoutAction
                          | LogoutSuccesfulAction;
type Age = number | string;
let numAge: Age = 50;
let strAge: ge = "Fifty";
function printAge(age: Age ): string{
   return "Is your age, ${age} ?"
}
## Classes & Interfaces
#Interfaces - are contracts - cant instantiate
interface Person {
firstName: string;
lastName: string;
}
# Classes - can be instantiated by TS
class Employee implements Person{
    firstName: string;
lastName: string;
}
## Shapes
Any two JavaScript objects (despite from different classes) are considered equivalent, 
if they are composed of same type of attributes
## Decorators
https://www.typescriptlang.org/docs/handbook/decorators.html
https://www.youtube.com/watch?v=3Rgv2UWK2Bo
Decorators are functions which will return functions, will/can have metadata
Auxiliary functions that can be hosted by Classes, Methods, Properties, Parameters, or Accessors
##Types of Decorators##
Class Decorators: @NgModule, @Component
Property decorators: @Input, @Output
Method decorators: @HostListener (event decorator)
Parameter decorators: @Inject
Any function can be used as a decorator
** Decorator - executed at the time of class evaluation (not instantiation) **
function myDecorator(prefix?: string){
return (constructor:any) => {
console.log(constructor);
console.log("decorator evaluated");
constructor.prototype.message = prefix + constructor.name;
}
}
@myDecorator("Hello ")
class World{
  message: string;
}
let w = new World();
console.log("Class Decorator ");
console.log(w.message);
console.log("-------------");
#Decorator Factory - We can write a decorator factory in the following fashion:

function color(value: string) {
  // this is the decorator factory
  return function (target) {
// this is the decorator
// do something with 'target' and 'value'...
  };
}
## Data Binding - Data binding signifies how and what kind of data is bound between a component and its template.

#1# Interpolation – binds component properties in output template. It uses {{}}.
Syntax : {{inrerpolation}}  means --> {{valueTo bind}}
<span>{{title}} App is running !!! </span>
#2# Property Binding – flows data from the component to the element. Uses []
<span [style.color]="componentStyle"> Some colored test!!</span>
#3# Event Binding – flows data from an element to the component. Uses ()
<button (click)="alertTheWorld()">Click Me</button>
#4# Two-Way Binding – is a combination of the Event and Property Bindings. 
Used along with the ngModel object. 
*Must Import FormsModule
          <input [(ngModel)]="dynamicValue" 
             placeholder="Watch the text update !" 
             type="text">
         <span>{{dynamicValue}}</spn>

## Component

Components form the building blocks of an Angular application.

To create a Component, issue the following command

ng g component <component name>

#Practice

1. Make Developer Class (ng g class Developer)

1. firstName: string

2. lastName: string

3. favoriteLanguage: string

4. yearStarted: number

2. Make new component called ‘bio’ (ng g component bio)

1. Import Developer Class

2. Create instance of a developer inside of the constructor

assign to a property called ‘dev’

3. Display dev component in bio.component.html

4. Add <app-bio></app-bio> to app.component.html

5. Create toggle switch using ngIF to only display bio component

if link is clicked

Use the "selector" value of the new component in the app.component.html to import it there

##  Component Lifecycle

• The lifecycle of a component is managed by Angular itself.

• It manages creation, rendering, binding data-bound properties etc. and also

offers the feature “hooks” that allows responding to key lifecycle events.

• Here is the complete lifecycle hook interface inventory:

• ngOnChanges - called when an input binding value changes.

• ngOnInit - called after the first ngOnChanges.

• ngDoCheck - called after every run of change detection.

• ngAfterContentInit - called after the component content is initialized.

• ngAfterContentChecked - called after every check of component content.

• ngAfterViewInit - called after the component's view(s) are initialized.

• ngAfterViewChecked - called after every check of a component's view(s).

• ngOnDestroy - called just before the component is destroyed.

##  Elvis Operator:

• If a property, that does not exist, is referenced in a template, an

exception is thrown.

• The “Elvis Operator” is a simple and easy way to guard against null and

undefined properties.

• It is denoted by a question mark immediately followed by a period “?.”.

<md-input-container>

<label>Type to see the value</label>

<input md-input type="text" />

</md-input-container>

<strong>{{input?.value}}</strong>

##  Structural Directives

•  ngIf adds and removes elements in the DOM based on the

results of an expression.

•  ngFor is a repeater directive that outputs a list of elements by

    iterating over an array.

<div *ngIf="emps">

<div *ngFor="let emp of emps">

<p><span class="bold">First Name:</span> {{emp.firstName}}</p>

<p><span class="bold">Last Name:</span> {{emp.lastName}}</p>

<p><span class="bold">Department:</span> {{emp.department}}</p>

<hr />

</div>

</div>

##  Introduction to Routes

• Routing allows to:

– Recover browser history functionality which is otherwise lost with SPA

– Maintain the state of the application.

– Implement modular applications.

– Implement the application based on the roles (certain roles have access to

certain URLs).

• Routes are injected into <router-outlet></router-outlet>. 

This is most commonly placed in app.component.html below any navigation bars or content

that you want to appear on every page


You can add routing after the fact

• Angular best practice is to create a separate, top-level module dedicated to routing

ng g module app-routing --flat --module=app


• - - flat flag will place the module in the top level

• - - module=app will add the import for app-routing inside of AppModule


## Route Configuration

Routes are configured, using the Routes type, which is an array of route objects. 

The route object is composed of the following attributes:

– Path: URL to be shown in the browser when application is on the specific

route.

– Component: Component to be rendered to, when the application is on the

specific route. This is the output of the router link.

##app-routing.module.ts## sample 

import { NgModule } from '@angular/core';

import { Routes, RouterModule } from '@angular/router';

import { HomeComponent } from './home/home.component';

import { EmpComponent } from './emp/emp.component';

const routes: Routes = [

{path:'', component: HomeComponent},

{path:'bio',component:EmpComponent}];


@NgModule({

  imports: [RouterModule.forRoot(routes)],

  exports: [RouterModule]

})

export class AppRoutingModule { }

## redirectTo ## Routes may redirect to other routes, using the redirectTo attribute.

const routes: Routes = [

{ path='', redirectTo= 'emp', pathMatch ='full' },

{ path='emp', component:EmpComponent }];

##Route Navigation##

• In the view template, routerLink directive may be used inside of

an anchor tag to add links that point to the defined routes.

<a routerLink:"/component-one" > Component One </a>

#Programatically do it this way

import {Router} from '@angular/router';

constructor (private router: Router){}


this.router.navigate("/emp")

##  BootStrap Stylesheet - industry standard style for Angular

<!-- google for bootstrap style sheet link to get below link -->

<!-- Add below link under head section of index.html -->

<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css"

integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">


##  Services in Angular ##

• Create a Service using the command:

ng g service <<ServiceName>>

• Services expose methods (supposedly business logic) as

API along with optional public properties.

• Services are dependencies injected by Angular, which

maintains each service as a singleton.

• The @Injectable decorator is used to mark a Typescript

class as a Service.

• providedIn: Determines which injectors will provide the

injectable, by either associating it with an @NgModule or

other InjectorType, or by specifying that this injectable

should be provided in the 'root' injector, which will be the

application-level injector in most apps


## Injectable Decorator

## Dependency Injection

• Import the service class

• Define it within the constructor parameters

• It should have the ‘private’ access modifier

## Route Parameters


Parameters are sent using :paramName embedded on to the router paths:

const routes: Routes = [

{ path='', redirectTo= 'product-list', pathMatch ='full' },

{ path='product-details/:id', component: ProductDetails }];

The parameter can then be sent using routerLink and interpolation:

<a routerLink="product-details/{{product.id}}"> {{product.name}} </a>

OR the routerLink directive may be used and supplied with parameters 

to be automatically sent, while invoking a router link.

<a *ngFor="let product in products"

[routerLink]="['/product-details', product.id]">

{{product.name}}

</a>


## Reading Routing Parameters using ActivatedRoute ##

– Angular provides the ActivatedRoute service, which in turn supplies 

a paramMap property which contains the parameters.

import {ActivatedRoute} from '@angular/router';

@Component.....

export class TestComponent implements onInit{

id;

constructor(private route: ActivatedRoute){}

ngOnInit(){

this.id = this.route.snapshot.paramMap.get('id')

}

}

## Navigate to parameterized routes via code

Navigating to parameterized routes internally:

– Create a Router object and use the navigate method.

– Pass an array containing the main route and the parameter

(just like routerLink directive on previous page) 

@Component ....

export class TestComponent{

constructor(private router: Router){}

goToProductDetails(id){

this.router.navigate('/product-detail', id);

}

}



## Creating in app-routing.module.ts

## Linking

## Retrieving parameters in component class

## Child Routes

================================================








Thursday, January 14, 2021

Google Cloud Data Engineer - My Quick Notes 2 (ML)

 


https://developers.google.com/machine-learning/glossary

------------------------------------------------------------------------------

ML Categories

Unsupervised Learning

Supervised Learning

Reinforcement Learning

------------------------------------------------------------------------------

** UnSupervised Learning

Draw inference from data

Previously undetected patterns

Example - 

Clustering (Finding groups of similar entities in a data set)

Anomaly Detection

Principal component analysis - get the most important attributes

** Supervised Learning

Learn from examples

Goal is to predict category or value

Example

Classifying tumers from images - (Classification)

Predicting housing prices - (Regression)

Identify fraudulent credit card transactions 

**  Reinforcement Learning (not explored in DE exam much)

Learn from environment

Maximize reward

Does not require examples

Instead it uses Exploration from the environment and expolitation of data points

Example

Agent taking actions in environment and receiving rewards

------------------------------------------------------------------------------

2 approaches to ML

Symbolic Artificial Intelligence (2006-2009)

Neural networks and deep learning (built on neural networks)

------------------------------------------------------------------------------

**  Symbolic Artificial Intelligence

Symbols represent entities and attributes

Manipulate symbols to make inferences

Models of Reasoning

Logic

Cognitive science

Features

Say, to predict re-admittance of a patient to hospital

Use, length of stay, type of operation, Age etc

Symbolic ML Algorithms

Decision Trees 

Ask questions --> dig further based on answers, with more questions

Set of decision points , and Terminal node is the answer/Classification

Random Forest

If there are multiple decision trees built with different features - popular

Naive Bayes

Conditional probability

Support Vector Machines (SVMs)

represent entities as points in space

Similar entities are close in space

Dissimilar entities separated by gap - this algo find the gap

K Nearest Neighbors  

- To Categorize

- Finding ways to measure distance b/w objects, closer once are same categories

**  Neural networks and deep learning

Neuron line abstraction

Inputs are numbers (x) - featuers or output of another Neuron 

Weights assign importance to inputs (W)

x1*W1 + x2*W2 + x3*W3 --(non-linear fuction aka Neuron)--> Output

Non-linear fuction is called Activation Function

Sigmoid

TanH

ReLU

**** We train the model to adjust the weight to get the desired output ****

Layers - can be any number of (simple one has 3)

Input Layer

Hidden Layer

Output Layer

Deep Learning (more than 3 layers)

Challenging to learn weights

Backpropagation algo is used to adjust the weights 

- takes in to account the size of the error, 

& the slope towards the right/correct answer, ideal point


==================================================================

Entity & Attributes

Features

Label

ML Uses featuers to predict Label


Feature Engineering

Manipulate features to improve the quality of the ML model

Identify useful features (original or transformed value)

Derived features


** Ways to do feature engineering

Transform existing features (cleanup etc)

Map numeric values to a scale of 0 to 1

Bucketing - to reduce # of values (say 1-100 to 10 buckets)

Feature-cross - cartition product of 2 or more features

say -weights(light medium heavy) x color(blue, green, red) -9 combos comes

helps with non-linear relationships to capture

Binary featues

is_red, is_blue like that

Decompose value parts

From date - extract day, month, year

From Address - extract street etc

One-Hot Encoding

Map value to a single bit in a binary array

each position represents a possible value(liek Red - 100, green 001, etc)

                       used to represent categorical features in deep learning models. 

Normalization

Convert numeric value to a standard (0 to 1 or  -1 to +1)

0 to 1 is called Scaling (divide feature value with max value)


Model Building

Define problem

Collect Data

Define Evaluation method

Prepare the data (iteratively)

Split the data in to Training, Validating & Test

Execute the Algorithm on data to build the model

Validate the model (tune the model)

adjust the hyper parameters (not learned from the data)

# of layers in NN, decision tree depth allowed, max trees in RForest etc 

[params are learned by algo from data]

Test model

[Training -> Model -> Validation -> Tune model -> Training; then test once all done]


Evaluating Model

Commonly used metrics

Accuracy (classification problems)

Precision  (classification problems)

Recall  (classification problems)

Mean Squired Error (reggression problem)

*** Never test with training data

Confusion Metrix - Actual x Predicted

Accuracy - # of correctly predicted data points - (TP+TN)/(TP + FP + TN + FN)

Precision - % of positive data points ( TP/(TP+FP)) 

Recall -  % of actual positive data points identified TP/(TP+FN)


===============================================================

Deep Learning

Gradient Descent 

- U shaped graph in first quadrant.

- x-axis Weight

- y-axis Loss

- AIM:  minimize the total loss

- Train the model to make initial weight to Optimal weight

- Gradient (slope) - which dir to go, how fast to go

- "Learning rate"(hyper param) determines the incremental step size

- here weight is the parameter the model leans

- "Hyper parameters" we adjust to get the optimal "parameter" which is weight

Types:

Batch gradient discent

Loss is calculated over entire data set

Slow on large data sets

Stochastic Gradient Descent

For large datasets (so in Deep Learning)

Weights are updated after each instance (not after entire dataset)

Can adjust the weight with each example

Training instances are randomly sorted (Stochastic)

Random walks avoids getting stuck

Mini-batch gradient descent

B.w batch and stochastic

How to calculate the gradient? Solution is BackPropagation

BackPropagation

Compute gradient of mapping function over an input-output pair

Calculate partial derivative of loss function relative to each weight

More effifient than naive calculation

.. add more notes


------------------------------------------------

Model Troubleshooting

------------------------------------------------

Underfitting

Model performs poorly on training and validation data

Ways to correct underfitting

Increase the complexity of the model

add additional layers in NN

increase # of decision trees allowed in Random Forest

increate the max depth in decision trees

Increase the Training Time or epochs

#epoch- number of iterations of the entire training dataset the ML algo completed

Overfitting

Model performs well on training data but poorly on validation data

Correction options

Regularization - which limits the info captured

To avoid outliers in the data over-influence the model

Bias - Variance Tradeoff

        https://towardsdatascience.com/bias-and-variance-in-linear-models-e772546e0c30

These are the natural characteristics of model, but need trade-offs

Bias Error

Result of missing relationships b/w features & models

means, we miss some important info as a feature?

Bcoz, we did not sufficiently generalize from training data

Variance Error: 

Due to sensitivity in the small fluctuations in the training data

Small changes in the input can cause large changes in the output

variance is the difference among a set of predictions

Bias and Unfairness issue:

Fairness

Anti-classification -: Protected attribued not used in the model (Gender)

Classification parity:

Predictive peformance are equal across groups

Calibration:

Outcomes are indepedent of protected attributes

==============================================

quick additional notes

Vision AI - Transfer Learning (use one for another set of probs)

Collaboration filtering - recommendations

          Cloud Run - if model is stateless (to deploy models) 


GPU - High paralle processing, ALU, Matrix multiplication (need NVDIA drivers)

TPU - Application Specific Integration circuit (ASIC) - for tensorflow models

             Cost less than GPU

 

https://docs.google.com/forms/d/e/1FAIpQLSfkWEzBCP0wQ09ZuFm7G2_4qtkYbfmk_0getojdnPdCYmq37Q/viewform


https://cognizant.udemy.com/course/google-cloud-professional-data-engineer-get-certified/learn/quiz/4945080#overview

 



Google Cloud Data Engineer - My Quick Notes 1 (few services, IAM, Security)

 


MemoryStore - Redis and MemCached

App Engine - Standard and Flexible environment

Cloud Composer - AirFlow - Architecture

Tenant Project - (AirFlow Database (Cloud SQL) to store metadata, Web Service (App Engine), 

Customer Project - GKE  with AirFlow Worker, 

Redis (persist message across container restarts), 

AirFlow Scheduler and Cloud SQL Proxy, 

Cloud Storage (staging the DAG, logs etc) 

Cloud Data Fusion (ETL tool) - based on CDAP dta analytics platform

Execution environment - instance

Basic - visual designer, transformations, SDK and 

Enterprise: Basic + Streaming , metadata repo, HA, triggers, schedulers


Pub/Sub - Subscription - pull / pull -  for Async Integrations

Multi-Cloud environment - avoid single vendor lock-instance

Anthos 

    - Run workloads in Kubernetes cluster

- Multi-Cloud Application Mordernization platform

- Anthos is a managed application platform that extends Google Cloud services and engineering practices

to your environments so you can modernize apps faster and establish operational consistency across them.

- Build, deploy, and optimize applications anywhere—simply, flexibly, and securely

- Consistent development and operations experience for hybrid and multi-cloud environments

- Achieve up to 4.8x ROI within 3 years according

Cloud Code

Cloud Build

Dashboards & Visualizations, Metrics explorer, Uptime checks, Alerts, Resource Usage page.

- Alerting - Create Policy - add conditions, Notification channels - can be Email, Pub/Sub, Pager, slack, SMS, Console(mob), Pager, campfire

- Cloud logging - using FluentD agent - fully managed services - 30 days - or export it (using Cloud Router)

- Cloud Router - create Sink for the logs to flow in to - to BigQuery, Cloud Storage, Pub/Sub, Custom destination

- Installing Monitoring Agent on VMs --> downlod & install the stackdriver-agent (apt-get) and start the services (sudo service stackdriver-agent start)


IAM - principle of least privilege.

G-Suit Account

- Domain registration - Hosting company provides an email address in that domain

- Google provides such an option - can use existing domain or create new domain name

- domain name as ‘yoursitename.com’. Now, the emails will look like ‘john@yoursitename.com’. 

Google Groups - easier to provide/remove access to users

Members - Users (will have one or more roles attached) - can be individual ids, service accounts, GSuite or google groups

Roles - Predefined, Custom and Primitive - attached to identities

Pre-defined Roles (with many pre-defined permissions)

There is a big list of roles pre defined like <service>-admin, viewer, manager Admin,                             reader etc

Sample 

role: <serviceName>.<genericRoleName> --> Big Query Data Owner

BigQuery Data Owner --> has many permissions attached - like                 bigquery.dataset.create, bigquery.models.create, bigquery.table.delete .... etc

BigQuery Data viewer --> many permissions - all are like .get, .list, .export etc                                         -   basically read only ones

Custom Roles:

Example: Big Query Data Owner but with no model (ML model) access

--> "CREATE FROM ROLE" option is best way name it as "Big Query Data Owner - No Model"

Primitive Roles

Owner (can setup billing for a project), Editer and Viewer

Service Account - for Apps and Servers


Policies are attached to resources (Policy is collection of statements)

Resource Hierarchy - Organization - folders - projects - resources


Data Loss prevention (DLP Service) - Security Practice

PII protection etc

Helps to classify data

Automatically mask data

measure re-identification risk

*InfoTypes --> Pattern detector - identify sensitive info (PII)

*Inspection jobs - applies InfoTypes to a dataset 

--> API returns InfoType, Likelihood score and Location

* Risk Analysis job - find the probability that data can be reIdentified

Legal complienance - GCP is HIPAA compliant


HIPAA, HITECH (Health Info Technology of Economics & Clinical Health)

GDPR - EU regulation


Encryption @ GCP

At-Rest and in-transit

Hardware leve - AES256 or AES128 algo (highly uncrackable)

Data (say @ Colossus FS ) AES256

Encryption Key and Key-Encryption-Key (double protection)

Transit - Encrypt & Authentication

Internal GCP - not encrypted - used ATLS (App Layer Transport Security)

Internet - uses TLS or QUIC(Google developed protocol)

Key Management (KMS)

Google Managed

Customer Managed (key created by customer, managed by google) - App level encription

Customer supplied -(CSEK) - customer want complete control over the keys


Thursday, January 7, 2021

B2B, B2C, B2B2C (B2X) business models

 

B2B, B2C (B2B2C - aka - B2X) applications


B2B - Tech Team, fewer users 

- B2B business model, your focus is on professionals in third-party commercial organizations

- Thus the number of potential users of your application is limited

- UX less critical.

B2C - End Users, high concurrent usage, emotional factors (speed/feel/entertainment/content)

- Even millions of users can use the app simultaneously.

B2X - Engage both Providers (B2B) & End Users (B2C via B2B or directly) [booking site -Providers, end Users]

- B2B2C are modern complex platforms offering a horizontal solution to another business’s problem. In other words, your client is a third-party business that brings you access to end customers that you can also serve and engage in.

- In this case, you not only think about the wishes of your user (a company) but also about the preferences of their customers (the end-consumer).

B2B and B2C use different business models

- B2B apps often offer a subscription service to the organization. 

- B2C apps are usually free of charge, but still profitable because they provide space to advertisers, sell extras in the app, or are a subscription service.


Courtesy/Reference: https://medium.com/@moqod_development/4-differences-in-b2b-and-b2c-applications-development-ed33ba025f2c


BigQuery My Reference notes


dw-bq-migration-overview MUST Read

https://cloud.google.com/solutions/migration/dw2bq/dw-bq-migration-overview

ARRAY_AGG, ARRAY_LENGTH 

You can do some pretty useful things with arrays like:

  • finding the number of elements with ARRAY_LENGTH(<array>)

  • deduplicating elements with ARRAY_AGG(DISTINCT <field>)

  • ordering elements with ARRAY_AGG(<field> ORDER BY <field>)

  • limiting ARRAY_AGG(<field> LIMIT 5)

SELECT
fullVisitorId, date, ARRAY_AGG(DISTINCT v2ProductName) AS products_viewed, ARRAY_LENGTH(ARRAY_AGG(DISTINCT v2ProductName)) AS distinct_products_viewed, ARRAY_AGG(DISTINCT pageTitle) AS pages_viewed, ARRAY_LENGTH(ARRAY_AGG(DISTINCT pageTitle)) AS distinct_pages_viewed FROM `data-to-insights.ecommerce.all_sessions` WHERE visitId = 1501570398 GROUP BY fullVisitorId, date ORDER BY date


Querying datasets that already have ARRAYs

In a BigQuery schema, an ARRAY field is noted as a REPEATED Mode.
SELECT visitId,  hits.page.pageTitle
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`
WHERE visitId = 1501570398

You will get an error: Cannot access field page on a value with type ARRAY<STRUCT<hitNumber INT64, time INT64, hour INT64, ...>> at [3:8]

Before you can query REPEATED fields (arrays) normally, you must first break the arrays back into rows.

How do you do that with SQL?

Answer: Use the UNNEST() function on your array field:

SELECT DISTINCT
  visitId,
  h.page.pageTitle
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`,
UNNEST(hits) AS h
WHERE visitId = 1501570398
LIMIT 10

  • You need to UNNEST() arrays to bring the array elements back into rows
  • UNNEST() always follows the table name in your FROM clause (think of it conceptually like a pre-joined table
STRUCTs

The easiest way to think about a STRUCT is to consider it conceptually like a separate table that is already pre-joined into your main table. A STRUCT can have another STRUCT as one of its fields (you can nest STRUCTs)

A STRUCT can have:

    • one or many fields in it
    • the same or different data types for each field
    • it's own alias

SELECT   visitId,   totals.*,   device.*

FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`

WHERE visitId = 1501570398

LIMIT 10



#standardSQL

SELECT race, participants.name

FROM racing.race_results

CROSS JOIN race_results.participants  # this is the STRUCT (it is like a table within a table)

Below query will give you the same query result:

#standardSQL
SELECT race, participants.name
FROM racing.race_results AS r, r.participants

       If you have more than one race type (800M, 100M, 200M), wouldn't a CROSS JOIN just associate every racer name with every possible race like a cartesian product?

Answer: No. This is a correlated cross join which only unpacks the elements associated with a single row. For a greater discussion, see working with ARRAYs and STRUCTs

#standardSQL
SELECT COUNT(p.name) AS racer_count
FROM racing.race_results AS r, UNNEST(r.participants) AS p


QUANTILES & APPROX_QUANTILES

What is Quantiles (fractiles)  (100 quantile is Percentile)

Percentile -> divide the set into 100 parts...

SELECT APPROX_QUANTILES(x, 2) AS approx_quantiles
FROM UNNEST([1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x;

--> [1, 5, 10]


SELECT APPROX_QUANTILES(x, 4AS approx_quantiles
FROM UNNEST([11145678910]) AS x;

--> [1, 1, 5, 8, 10]

Approx Functions in Big Query (link)

SELECT FORMAT("%T", APPROX_QUANTILES(DISTINCT x, 2 RESPECT NULLS)) AS approx_quantiles
FROM UNNEST([NULL, NULL, 1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x;

+------------------+
| approx_quantiles |
+------------------+
| [NULL, 6, 10]    |
+------------------+
SELECT APPROX_QUANTILES(x, 4) AS output
FROM UNNEST(GENERATE_ARRAY(1, 100)) AS x;

--> [1, 25, 50, 75, 100]

SELECT APPROX_QUANTILES(x, 100) AS output
FROM UNNEST(GENERATE_ARRAY(1, 200)) AS x;
--> [1, 2, 4, 6, 8, 10, 12, ....., 198, 200]

SELECT APPROX_QUANTILES(x, 100) [OFFSET (5)] AS output
FROM UNNEST(GENERATE_ARRAY(1, 200)) AS x;
 --> [10]

Approximate aggregate functions are scalable in terms of memory usage and time, but produce approximate results instead of exact results. These functions typically require less memory than exact aggregation functions like COUNT(DISTINCT ...)

----------------------------------------------------------------------------------------------

Read more here --> BigQuery Functions and Operators

SAFE. prefix

If you begin a function with the SAFE. prefix, it will return NULL instead of an error. The SAFE. prefix only prevents errors from the prefixed function itself

SELECT SAFE.SUBSTR('foo', 0, -2) AS safe_output UNION ALL
SELECT SAFE.SUBSTR('bar', 0, 2) AS safe_output;

+-------------+
| safe_output |
+-------------+
| NULL        |
| ba          |
+-------------+
 
If no SAFE. prefix used --> query fails with error "Third argument in SUBSTR() cannot be negative" 


BigQuery- Query Cache

Cache results of previous queries

The BigQuery service automatically caches query results in a temporary table. If the identical query is submitted within approximately 24 hours, the results are served from this temporary table without any recomputation. Cached results are extremely fast and do not incur charges.

There are, however, a few caveats to be aware of. Query caching is based on exact string comparison. So even whitespaces can cause a cache miss. 

Queries are never cached if they exhibit non-deterministic behavior (for example, they use CURRENT_TIMESTAMP or RAND), if the table or view being queried has changed (even if the columns/rows of interest to the query are unchanged), if the table is associated with a streaming buffer (even if there are no new rows), if the query uses DML statements, or queries external data sources.

WITH CLAUSE (Common Table Expression)

WITH clause (also called a Common Table Expression) improves readability but does not improve query speed or cost since results are not cached. 

 The same holds for views and subqueries as well

if used frequently, one way to potentially improve performance is to store the result into a table (or materialized view).

BIG QUERY - VERY RELEVANT INFO

https://googlecourses.qwiklabs.com/course_sessions/107473/labs/25818 

_TABLE_SUFFIX (WildCard table reference)

#standardSQL
 CREATE OR REPLACE TABLE ecommerce.days_with_rain
 PARTITION BY date
 OPTIONS (
   partition_expiration_days=60,
   description="weather stations with precipitation, partitioned by day"
 ) AS
 SELECT
   DATE(CAST(year AS INT64), CAST(mo AS INT64), CAST(da AS INT64)) AS date,
   (SELECT ANY_VALUE(name) FROM `bigquery-public-data.noaa_gsod.stations` AS stations
    WHERE stations.usaf = stn) AS station_name,  -- Stations may have multiple names
   prcp
 FROM `bigquery-public-data.noaa_gsod.gsod*` AS weather
 WHERE prcp < 99.9  -- Filter unknown values
   AND prcp > 0      -- Filter
   AND CAST(_TABLE_SUFFIX AS int64) >= 2017
   AND CAST(_TABLE_SUFFIX AS int64) <= 2019


BQML - Big Query Machine Learning 

CREATE OR REPLACE MODEL
  bike_model.model_bucketized TRANSFORM(* EXCEPT(start_date),
  IF
    (EXTRACT(dayofweek
      FROM
        start_date) BETWEEN 2 AND 6,
      'weekday',
      'weekend') AS dayofweek,
    ML.BUCKETIZE(EXTRACT(HOUR
      FROM
        start_date),
      [5, 10, 17]) AS hourofday )
OPTIONS
  (input_label_cols=['duration'],
    model_type='linear_reg') AS
SELECT
  duration,
  start_station_name,
  start_date
FROM
  `bigquery-public-data`.london_bicycles.cycle_hire

EVALUATE

select * from ML.EVALUATE(<dataset>.<model_name>) or 
select * from ML.EVALUATE(<dataset>.<model_name>, 'SQL query to provide input sample data')

SELECT * FROM ML
ML.PREDICT(MODEL bike_model.model_bucketized,
( SELECT 'Park Lane , Hyde Park' AS start_station_name, CURRENT_TIMESTAMP() AS start_date) )


MODEL WEIGHTS

A linear regression model predicts the output as a weighted sum of its inputs. Often times, the weights of the model need to be utilized in a production environment.

SELECT * FROM ML.WEIGHTS(MODEL bike_model.model_bucketized)


BQ Create, Load commands

          bq --location=EU mk --dataset movies
bq load --source_format=CSV  
--location=EU
--autodetect movies.movielens_movies_raw gs://dataeng-movielens/movies.csv

        
To replace a formatted String ( like, a | separated string into array)

SELECT * REPLACE(SPLIT(genres, "|") AS genres)

FROM movies.movielens_movies_raw WHERE movieId < 5

 

Collaborative Filtering

Matrix factorization is a collaborative filtering technique that relies on two vectors called the user factors and the item factors. The user factors is a low-dimensional representation of a user_id and the item factors similarly represents an item_id.

To perform a matrix factorization of our data, we use the typical BigQuery ML syntax except that the model_type is matrix_factorization and we have to identify which columns play what roles in the collaborative filtering setup.

CREATE OR REPLACE MODEL movies.movie_recommender

OPTIONS (model_type='matrix_factorization', user_col='userId', item_col='movieId', rating_col='rating', l2_reg=0.2, num_factors=16) AS

SELECT userId, movieId, rating

FROM movies.movielens_ratings 

      

ML.PREDICT

SELECT * FROM
ML.PREDICT(MODEL `cloud-training-prod-bucket.movies.movie_recommender`,
( SELECT movieId, title, 903 AS userId
FROM
`movies.movielens_movies`, UNNEST(genres) g
WHERE g = 'Comedy' ))
ORDER BY redicted_rating DESC
LIMIT 5


         Ref: https://googlecourses.qwiklabs.com/ course_sessions /109507 /labs/ 12073 


Materialized VIEW - Automatic refresh (can turn on/off)

By default, materialized views are automatically refreshed within 5 minutes of a change to the base table. Examples of changes include row insertions or row deletions.

Automatic refresh can be enabled or disabled at any time.

To turn automatic refresh off when you create a table, set enable_refresh to false.

CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
PARTITION
BY RANGE_BUCKET(column, buckets)
OPTIONS
(enable_refresh = false)
AS SELECT ...For an existing materialized view, you can modify the enable_refresh value using ALTER MATERIALIZED VIEW

ALTER MATERIALIZED VIEW project-id.my_dataset.my_mv_table

SET OPTIONS (enable_refresh = true)


Query Data - options, Federated Query...

SQL GUI
BQ
Storage API --> Spark, Tensorflow, Dataflow, Pandas, Scikit-learn


Controlling access to DataSets

You can apply access controls during dataset creation by calling the datasets.insert API method.

Access controls can't be applied during dataset creation in the Cloud Console or the bq command-line tool.

You can apply access controls to a dataset after it is created in the following ways:

  • Using the Cloud Console.
  • Using the bq update command in the bq command-line tool.
  • Calling the datasets.patch API method.
  • Using the client libraries.

Dataset Sharing - option in UI [share dataset]


Partitioning

https://cloud.google.com/bigquery/docs/querying-partitioned-tables

   By ingestion time 
        When you create an ingestion-time partitioned tabletwo pseudo columns         are added to the table: a _PARTITIONTIME pseudo column and                 a _PARTITIONDATE pseudo column. 

        When you query data in ingestion-time partitioned tables, you reference                 specific partitions by specifying the values in                 the _PARTITIONTIME or _PARTITIONDATE pseudo columns. For example:

  • _PARTITIONTIME >= "2018-01-29 00:00:00" AND _PARTITIONTIME < "2018-01-30 00:00:00"
  • _PARTITIONTIME BETWEEN TIMESTAMP('2016-01-01') AND TIMESTAMP('2016-01-02')
         Limiting partitions queried using pseudo columns

  column
FROM
 
dataset.table
WHERE
  _PARTITIONTIME
BETWEEN TIMESTAMP('2016-01-01')
 
AND TIMESTAMP('2016-01-02')


  Partition by Date/Time
         No pseudo columns here

        Special partitions - 
          _NULL_ if value is null
          _UNPARTITIONED_ (if value is outside the allowed range)


Integer Range Partitioning
  • Using the Cloud Console
  • Using a DDL CREATE TABLE statement with a PARTITION BY RANGE_BUCKET clause that contains a partition expression
RANGE_BUCKET (
<integer_column>,
GENERATE_ARRAY(<beginning>, <end + 1>, <interval_length>)
)
* or in BQ command use --range_partitioning=pickup_location_id,0,300,5
ArgumentValue
column namecustomer_id
start0
end100
interval10

The table will be partitioned on the customer_id column into ranges of interval 10. The values 0 to 9 will be in one partition, values 10 to 19 in another partition, ..., and finally values 90 to 99 will be in another partition. Values outside of 0 to 99 (such as -1 or 100) will be in the UNPARTITIONED partition. Null values will be in the NULL partition. (pseudo columns _)UNPARTITIONED__, __NULL__



Sharding & partitioning

Partitioning is preferred - less metadata, better performance, less permission checks

Sharding - create separate table for each day/hour as u set as below
<TABLE_NAME_PREFIX>_<YYMMDD>


When querying a date-sharded table, you only include the table(s) that you need. You can use either a UNION ALL, or a wildcard table format.

SELECT
*
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_201707*`
WHERE
_TABLE_SUFFIX BETWEEN '01' AND '14'


Use union in queries to scan multiple tables to get the result.

Clustering - to keep like data together 
- suppose if you want to search on a TEXT column
CREATE TABLE <my_partitioned_table>
PARTITIONED BY DATE(creation_date)
CLUSTER BY deviceName
AS
SELECT * from <Non Partitioned Table>

Data Analytics - Declarative, Predictive, Prescriptive