Sunday, January 17, 2021

####### Introduction to Angular ######

##Angular (ng) High-level info (Angular 7)

Angular 2+ (‘Angular’) is a Typescript-based opensource front-end web application platform (originally from Google)

TypeScript (.ts) => String Typing, Decorators (Annotations)

--> ES6 (ECMAScript) => Classes, Modules, Arrow Functions

TypeScript Compiler generates Java Script (human-readable) --> Compile in to ES5 or ES6

SPA - Single Page Application

- AJAX & HTML5 - HTML Fragments (mini--views)

Clean separation of code that renders UI & the code that does the application Logic

Dependency Ingection - loose coupling b/w components and services

Automatic change detection (UI gets reflected automatically)

Comes with Rx.js - subscriptionbased processing of asynchronous data - eliminated callback hell

Angular CLI - The scaffolding and deployment tool - spares developers from writing the boilerplate code and configuration scripts.

##Responsive Web Design (RWD)

- Web desgin approach - optimal viewing experiance on wide range of devices

- Responsive theme's layout adjusts to your devices screen size

##WorkSpace Setup

Node-v10.x https://nodejs.org/en/download/

Visual Studio Code

Angular CLI - npm install -g @angular/cli

##First Project (ng way)

#1 Open Terminal window --> go to your folder where you want to create the project

--> ng new <<your-project-name-here>>

#2 Launch the Angular application by running the command

--> ng serve --o

#3 Launches the index.html in a browser @ http://localhost:4200 (port might vary)

##Deployment and other info - https://angular.io/guide/deployment

##Build and Deploy as a JS

Ng serve

• In-memory compilation running on local server

• File changes trigger reload for iterative development

Ng build

• Creates output folder ‘dist’

• Consolidates app into a few js files

• Assets folder replicated

• Must change base href for deployment environment

##First Project (npm way)

Go to desired folder in Terminal and run --> npm init -y

look at files, especially package.json

Add dependencies – Angular, SystemJS, Live-server, TypeScript compiler

Run the install command to install the above dependencies --> npm install

Copy the following files

- SystemJS Config file, index.html, main.ts, app.module.ts and app.component.ts

Launch the Angular application by running the command

Launch the index.html in a browser @ http://localhost:4200/index.html (port might vary)

## Angular Project Structure

#Angular.json

"outputPath" is the location where the ng build command places the condensed js files for the entire app

index.html is the single page in which the Angular App is injected

main.ts is the “main” Angular entry point

polyfills.ts contains browser compatibility polyfills and application polyfills.

styles.css is where global styles can be placed. Any CSS rules you place here are injected into the DOM in a <style></style> tag

"assets" array describes the location of static file assets

#Index.html

#main.ts --> calls bootstrap function

#app.module.ts --> being bootstrapped - to app component

#app.component.ts --> @Component decorator

#app.component.html -->

################ Basic Concepts and Binding ##########

# Great size for your reference - https://basarat.gitbook.io/typescript/future-javascript/let

• TS / JS Primer

• Types

• Shapes

• Spread and Rest Operators

• Classes and Interfaces

• Decorators

• Arrow Functions

The Data Binding process in Angular 7

• Interpolation

• Property

• Event

• Two-Way

## Types (Strict Type checking)

function add (a:number, b:number){

return a+b;

}

add(‘5’,6);//Compilation error

add(1, 3);//works

#Types: Boolean, number, string, arrays [], object literals{}, undefined, null, enum, any and void

let hostname: string = "Sree";

let list: number[] = [1,2,3];

enum Color {Red, Blue, Green, Black};

let c: Color = Color.Black;

#Types inferred if no type is given

let a = 123;

## Spread Operators (spreading array into positional arguments)

var list = [1, 2, 3];

list = [...list, 4, 5, 6];

console.log(list); // [1,2,3,4,5,6]

## Destructiring

var [x, y, ...remaining] = [1, 2, 3, 4];

console.log(x, y, remaining); // 1,2,[3,4]

## Rest Operators (like varargs in java) - accept multiple arguments in your function and get them as an array.

function fnTakeItAll(first, second, ...allOthers) {

console.log(allOthers);

}

fnTakeItAll('cat', 'mat'); // []

fnTakeItAll('cat', 'mat', 'bat', 'fat'); // ['bat','fat']

## var vs let

#var - Variables in JavaScript are function scoped

var foo = 123;

if (true) {

var foo = 456;

}

console.log(foo); // 456

#let - variables with true block scope

let foo = 123;

if (true) {

let foo = 456;

}

console.log(foo); // 123

## Union of types (can be al alternative to Inheritance)

export enum Color {RED = ‘red’, BLUE = ‘blue’, WHITE = ‘white’}

we could use union types and get similar benefits in a much shorter fashion:

export type Color = 'red' | 'white' | 'blue';

export type Optional<T> = T | undefined;

let user: Optional<User>;

export type AuthAction = LoginAction

| LoginSuccessfulAction

| LoginErrorAction

| LogoutAction

| LogoutSuccesfulAction;

type Age = number | string;

let numAge: Age = 50;

let strAge: ge = "Fifty";

function printAge(age: Age ): string{

return "Is your age, ${age} ?"

}

## Classes & Interfaces

#Interfaces - are contracts - cant instantiate

interface Person {

firstName: string;

lastName: string;

}

# Classes - can be instantiated by TS

class Employee implements Person{

firstName: string;

lastName: string;

}

## Shapes

Any two JavaScript objects (despite from different classes) are considered equivalent,

if they are composed of same type of attributes

## Decorators

https://www.typescriptlang.org/docs/handbook/decorators.html

https://www.youtube.com/watch?v=3Rgv2UWK2Bo

Decorators are functions which will return functions, will/can have metadata

Auxiliary functions that can be hosted by Classes, Methods, Properties, Parameters, or Accessors

##Types of Decorators##

Class Decorators: @NgModule, @Component

Property decorators: @Input, @Output

Method decorators: @HostListener (event decorator)

Parameter decorators: @Inject

Any function can be used as a decorator

** Decorator - executed at the time of class evaluation (not instantiation) **

function myDecorator(prefix?: string){

return (constructor:any) => {

console.log(constructor);

console.log("decorator evaluated");

constructor.prototype.message = prefix + constructor.name;

}

@myDecorator("Hello ")

class World{

message: string;

}

let w = new World();

console.log("Class Decorator ");

console.log(w.message);

console.log("-------------");

#Decorator Factory - We can write a decorator factory in the following fashion:

function color(value: string) {

// this is the decorator factory

return function (target) {

// this is the decorator

// do something with 'target' and 'value'...

};

}

## Data Binding - Data binding signifies how and what kind of data is bound between a component and its template.

#1# Interpolation – binds component properties in output template. It uses {{}}.

Syntax : {{inrerpolation}} means --> {{valueTo bind}}

<span>{{title}} App is running !!! </span>

#2# Property Binding – flows data from the component to the element. Uses []

<span [style.color]="componentStyle"> Some colored test!!</span>

#3# Event Binding – flows data from an element to the component. Uses ()

<button (click)="alertTheWorld()">Click Me</button>

#4# Two-Way Binding – is a combination of the Event and Property Bindings.

Used along with the ngModel object.

*Must Import FormsModule

        <input [(ngModel)]="dynamicValue"
             placeholder="Watch the text update !"
             type="text">
         <span>{{dynamicValue}}</spn>

## Component

Components form the building blocks of an Angular application.

To create a Component, issue the following command

ng g component <component name>

#Practice

1. Make Developer Class (ng g class Developer)

1. firstName: string

2. lastName: string

3. favoriteLanguage: string

4. yearStarted: number

2. Make new component called ‘bio’ (ng g component bio)

1. Import Developer Class

2. Create instance of a developer inside of the constructor

assign to a property called ‘dev’

3. Display dev component in bio.component.html

4. Add <app-bio></app-bio> to app.component.html

5. Create toggle switch using ngIF to only display bio component

if link is clicked

Use the "selector" value of the new component in the app.component.html to import it there

## Component Lifecycle

• The lifecycle of a component is managed by Angular itself.

• It manages creation, rendering, binding data-bound properties etc. and also

offers the feature “hooks” that allows responding to key lifecycle events.

• Here is the complete lifecycle hook interface inventory:

• ngOnChanges - called when an input binding value changes.

• ngOnInit - called after the first ngOnChanges.

• ngDoCheck - called after every run of change detection.

• ngAfterContentInit - called after the component content is initialized.

• ngAfterContentChecked - called after every check of component content.

• ngAfterViewInit - called after the component's view(s) are initialized.

• ngAfterViewChecked - called after every check of a component's view(s).

• ngOnDestroy - called just before the component is destroyed.

## Elvis Operator:

• If a property, that does not exist, is referenced in a template, an

exception is thrown.

• The “Elvis Operator” is a simple and easy way to guard against null and

undefined properties.

• It is denoted by a question mark immediately followed by a period “?.”.

<md-input-container>

<label>Type to see the value</label>

</md-input-container>

<strong>{{input?.value}}</strong>

## Structural Directives

• ngIf adds and removes elements in the DOM based on the

results of an expression.

• ngFor is a repeater directive that outputs a list of elements by

iterating over an array.

<div *ngIf="emps">

<div *ngFor="let emp of emps">

<p><span class="bold">First Name:</span> {{emp.firstName}}</p>

<p><span class="bold">Last Name:</span> {{emp.lastName}}</p>

<p><span class="bold">Department:</span> {{emp.department}}</p>

<hr />

</div>

## Introduction to Routes

• Routing allows to:

– Recover browser history functionality which is otherwise lost with SPA

– Maintain the state of the application.

– Implement modular applications.

– Implement the application based on the roles (certain roles have access to

certain URLs).

• Routes are injected into <router-outlet></router-outlet>.

This is most commonly placed in app.component.html below any navigation bars or content

that you want to appear on every page

You can add routing after the fact

• Angular best practice is to create a separate, top-level module dedicated to routing

ng g module app-routing --flat --module=app

• - - flat flag will place the module in the top level

• - - module=app will add the import for app-routing inside of AppModule

## Route Configuration

Routes are configured, using the Routes type, which is an array of route objects.

The route object is composed of the following attributes:

– Path: URL to be shown in the browser when application is on the specific

route.

– Component: Component to be rendered to, when the application is on the

specific route. This is the output of the router link.

##app-routing.module.ts## sample

import { NgModule } from '@angular/core';

import { Routes, RouterModule } from '@angular/router';

import { HomeComponent } from './home/home.component';

import { EmpComponent } from './emp/emp.component';

const routes: Routes = [

{path:'', component: HomeComponent},

{path:'bio',component:EmpComponent}];

@NgModule({

imports: [RouterModule.forRoot(routes)],

exports: [RouterModule]

})

export class AppRoutingModule { }

## redirectTo ## Routes may redirect to other routes, using the redirectTo attribute.

const routes: Routes = [

{ path='', redirectTo= 'emp', pathMatch ='full' },

{ path='emp', component:EmpComponent }];

##Route Navigation##

• In the view template, routerLink directive may be used inside of

an anchor tag to add links that point to the defined routes.

<a routerLink:"/component-one" > Component One </a>

#Programatically do it this way

import {Router} from '@angular/router';

constructor (private router: Router){}

this.router.navigate("/emp")

## BootStrap Stylesheet - industry standard style for Angular

integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">

## Services in Angular ##

• Create a Service using the command:

ng g service <<ServiceName>>

• Services expose methods (supposedly business logic) as

API along with optional public properties.

• Services are dependencies injected by Angular, which

maintains each service as a singleton.

• The @Injectable decorator is used to mark a Typescript

class as a Service.

• providedIn: Determines which injectors will provide the

injectable, by either associating it with an @NgModule or

other InjectorType, or by specifying that this injectable

should be provided in the 'root' injector, which will be the

application-level injector in most apps

## Injectable Decorator

## Dependency Injection

• Import the service class

• Define it within the constructor parameters

• It should have the ‘private’ access modifier

## Route Parameters

Parameters are sent using :paramName embedded on to the router paths:

const routes: Routes = [

{ path='', redirectTo= 'product-list', pathMatch ='full' },

{ path='product-details/:id', component: ProductDetails }];

The parameter can then be sent using routerLink and interpolation:

<a routerLink="product-details/{{product.id}}"> {{product.name}} </a>

OR the routerLink directive may be used and supplied with parameters

to be automatically sent, while invoking a router link.

<a *ngFor="let product in products"

[routerLink]="['/product-details', product.id]">

</a>

## Reading Routing Parameters using ActivatedRoute ##

– Angular provides the ActivatedRoute service, which in turn supplies

a paramMap property which contains the parameters.

import {ActivatedRoute} from '@angular/router';

@Component.....

export class TestComponent implements onInit{

id;

constructor(private route: ActivatedRoute){}

ngOnInit(){

this.id = this.route.snapshot.paramMap.get('id')

}

## Navigate to parameterized routes via code

Navigating to parameterized routes internally:

– Create a Router object and use the navigate method.

– Pass an array containing the main route and the parameter

(just like routerLink directive on previous page)

@Component ....

export class TestComponent{

constructor(private router: Router){}

goToProductDetails(id){

this.router.navigate('/product-detail', id);

}

## Creating in app-routing.module.ts

## Linking

## Retrieving parameters in component class

## Child Routes

================================================

Thursday, January 14, 2021

Google Cloud Data Engineer - My Quick Notes 2 (ML)

https://developers.google.com/machine-learning/glossary

------------------------------------------------------------------------------

ML Categories

Unsupervised Learning

Supervised Learning

Reinforcement Learning

------------------------------------------------------------------------------

** UnSupervised Learning

Draw inference from data

Previously undetected patterns

Example -

Clustering (Finding groups of similar entities in a data set)

Anomaly Detection

Principal component analysis - get the most important attributes

** Supervised Learning

Learn from examples

Goal is to predict category or value

Example

Classifying tumers from images - (Classification)

Predicting housing prices - (Regression)

Identify fraudulent credit card transactions

** Reinforcement Learning (not explored in DE exam much)

Learn from environment

Maximize reward

Does not require examples

Instead it uses Exploration from the environment and expolitation of data points

Example

Agent taking actions in environment and receiving rewards

------------------------------------------------------------------------------

2 approaches to ML

Symbolic Artificial Intelligence (2006-2009)

Neural networks and deep learning (built on neural networks)

------------------------------------------------------------------------------

** Symbolic Artificial Intelligence

Symbols represent entities and attributes

Manipulate symbols to make inferences

Models of Reasoning

Logic

Cognitive science

Features

Say, to predict re-admittance of a patient to hospital

Use, length of stay, type of operation, Age etc

Symbolic ML Algorithms

Decision Trees

Ask questions --> dig further based on answers, with more questions

Set of decision points , and Terminal node is the answer/Classification

Random Forest

If there are multiple decision trees built with different features - popular

Naive Bayes

Conditional probability

Support Vector Machines (SVMs)

represent entities as points in space

Similar entities are close in space

Dissimilar entities separated by gap - this algo find the gap

K Nearest Neighbors

- To Categorize

- Finding ways to measure distance b/w objects, closer once are same categories

** Neural networks and deep learning

Neuron line abstraction

Inputs are numbers (x) - featuers or output of another Neuron

Weights assign importance to inputs (W)

x1*W1 + x2*W2 + x3*W3 --(non-linear fuction aka Neuron)--> Output

Non-linear fuction is called Activation Function

Sigmoid

TanH

ReLU

**** We train the model to adjust the weight to get the desired output ****

Layers - can be any number of (simple one has 3)

Input Layer

Hidden Layer

Output Layer

Deep Learning (more than 3 layers)

Challenging to learn weights

Backpropagation algo is used to adjust the weights

- takes in to account the size of the error,

& the slope towards the right/correct answer, ideal point

==================================================================

Entity & Attributes

Features

Label

ML Uses featuers to predict Label

Feature Engineering

Manipulate features to improve the quality of the ML model

Identify useful features (original or transformed value)

Derived features

** Ways to do feature engineering

Transform existing features (cleanup etc)

Map numeric values to a scale of 0 to 1

Bucketing - to reduce # of values (say 1-100 to 10 buckets)

Feature-cross - cartition product of 2 or more features

say -weights(light medium heavy) x color(blue, green, red) -9 combos comes

helps with non-linear relationships to capture

Binary featues

is_red, is_blue like that

Decompose value parts

From date - extract day, month, year

From Address - extract street etc

One-Hot Encoding

Map value to a single bit in a binary array

each position represents a possible value(liek Red - 100, green 001, etc)

used to represent categorical features in deep learning models.

Normalization

Convert numeric value to a standard (0 to 1 or -1 to +1)

0 to 1 is called Scaling (divide feature value with max value)

Model Building

Define problem

Collect Data

Define Evaluation method

Prepare the data (iteratively)

Split the data in to Training, Validating & Test

Execute the Algorithm on data to build the model

Validate the model (tune the model)

adjust the hyper parameters (not learned from the data)

# of layers in NN, decision tree depth allowed, max trees in RForest etc

[params are learned by algo from data]

Test model

[Training -> Model -> Validation -> Tune model -> Training; then test once all done]

Evaluating Model

Commonly used metrics

Accuracy (classification problems)

Precision (classification problems)

Recall (classification problems)

Mean Squired Error (reggression problem)

*** Never test with training data

Confusion Metrix - Actual x Predicted

Accuracy - # of correctly predicted data points - (TP+TN)/(TP + FP + TN + FN)

Precision - % of positive data points ( TP/(TP+FP))

Recall - % of actual positive data points identified TP/(TP+FN)

===============================================================

Deep Learning

Gradient Descent

- U shaped graph in first quadrant.

- x-axis Weight

- y-axis Loss

- AIM: minimize the total loss

- Train the model to make initial weight to Optimal weight

- Gradient (slope) - which dir to go, how fast to go

- "Learning rate"(hyper param) determines the incremental step size

- here weight is the parameter the model leans

- "Hyper parameters" we adjust to get the optimal "parameter" which is weight

Types:

Batch gradient discent

Loss is calculated over entire data set

Slow on large data sets

Stochastic Gradient Descent

For large datasets (so in Deep Learning)

Weights are updated after each instance (not after entire dataset)

Can adjust the weight with each example

Training instances are randomly sorted (Stochastic)

Random walks avoids getting stuck

Mini-batch gradient descent

B.w batch and stochastic

How to calculate the gradient? Solution is BackPropagation

BackPropagation

Compute gradient of mapping function over an input-output pair

Calculate partial derivative of loss function relative to each weight

More effifient than naive calculation

.. add more notes

------------------------------------------------

Model Troubleshooting

------------------------------------------------

Underfitting

Model performs poorly on training and validation data

Ways to correct underfitting

Increase the complexity of the model

add additional layers in NN

increase # of decision trees allowed in Random Forest

increate the max depth in decision trees

Increase the Training Time or epochs

#epoch- number of iterations of the entire training dataset the ML algo completed

Overfitting

Model performs well on training data but poorly on validation data

Correction options

Regularization - which limits the info captured

To avoid outliers in the data over-influence the model

Bias - Variance Tradeoff

https://towardsdatascience.com/bias-and-variance-in-linear-models-e772546e0c30

These are the natural characteristics of model, but need trade-offs

Bias Error

Result of missing relationships b/w features & models

means, we miss some important info as a feature?

Bcoz, we did not sufficiently generalize from training data

Variance Error:

Due to sensitivity in the small fluctuations in the training data

Small changes in the input can cause large changes in the output

variance is the difference among a set of predictions

Bias and Unfairness issue:

Fairness

Anti-classification -: Protected attribued not used in the model (Gender)

Classification parity:

Predictive peformance are equal across groups

Calibration:

Outcomes are indepedent of protected attributes

==============================================

quick additional notes

Vision AI - Transfer Learning (use one for another set of probs)
Collaboration filtering - recommendations

Cloud Run - if model is stateless (to deploy models)

GPU - High paralle processing, ALU, Matrix multiplication (need NVDIA drivers)
TPU - Application Specific Integration circuit (ASIC) - for tensorflow models

Cost less than GPU

https://docs.google.com/forms/d/e/1FAIpQLSfkWEzBCP0wQ09ZuFm7G2_4qtkYbfmk_0getojdnPdCYmq37Q/viewform

https://cognizant.udemy.com/course/google-cloud-professional-data-engineer-get-certified/learn/quiz/4945080#overview

Google Cloud Data Engineer - My Quick Notes 1 (few services, IAM, Security)

MemoryStore - Redis and MemCached

App Engine - Standard and Flexible environment

Cloud Composer - AirFlow - Architecture

Tenant Project - (AirFlow Database (Cloud SQL) to store metadata, Web Service (App Engine),

Customer Project - GKE with AirFlow Worker,

Redis (persist message across container restarts),

AirFlow Scheduler and Cloud SQL Proxy,

Cloud Storage (staging the DAG, logs etc)

Cloud Data Fusion (ETL tool) - based on CDAP dta analytics platform

Execution environment - instance

Basic - visual designer, transformations, SDK and

Enterprise: Basic + Streaming , metadata repo, HA, triggers, schedulers

Pub/Sub - Subscription - pull / pull - for Async Integrations

Multi-Cloud environment - avoid single vendor lock-instance

Anthos

- Run workloads in Kubernetes cluster

- Multi-Cloud Application Mordernization platform

- Anthos is a managed application platform that extends Google Cloud services and engineering practices

to your environments so you can modernize apps faster and establish operational consistency across them.

- Build, deploy, and optimize applications anywhere—simply, flexibly, and securely

- Consistent development and operations experience for hybrid and multi-cloud environments

- Achieve up to 4.8x ROI within 3 years according

Cloud Code

Cloud Build

Dashboards & Visualizations, Metrics explorer, Uptime checks, Alerts, Resource Usage page.

- Alerting - Create Policy - add conditions, Notification channels - can be Email, Pub/Sub, Pager, slack, SMS, Console(mob), Pager, campfire

- Cloud logging - using FluentD agent - fully managed services - 30 days - or export it (using Cloud Router)

- Cloud Router - create Sink for the logs to flow in to - to BigQuery, Cloud Storage, Pub/Sub, Custom destination

- Installing Monitoring Agent on VMs --> downlod & install the stackdriver-agent (apt-get) and start the services (sudo service stackdriver-agent start)

IAM - principle of least privilege.

G-Suit Account

- Domain registration - Hosting company provides an email address in that domain

- Google provides such an option - can use existing domain or create new domain name

- domain name as ‘yoursitename.com’. Now, the emails will look like ‘john@yoursitename.com’.

Google Groups - easier to provide/remove access to users

Members - Users (will have one or more roles attached) - can be individual ids, service accounts, GSuite or google groups

Roles - Predefined, Custom and Primitive - attached to identities

Pre-defined Roles (with many pre-defined permissions)

There is a big list of roles pre defined like <service>-admin, viewer, manager Admin, reader etc

Sample

role: <serviceName>.<genericRoleName> --> Big Query Data Owner

BigQuery Data Owner --> has many permissions attached - like bigquery.dataset.create, bigquery.models.create, bigquery.table.delete .... etc

BigQuery Data viewer --> many permissions - all are like .get, .list, .export etc - basically read only ones

Custom Roles:

Example: Big Query Data Owner but with no model (ML model) access

--> "CREATE FROM ROLE" option is best way name it as "Big Query Data Owner - No Model"

Primitive Roles

Owner (can setup billing for a project), Editer and Viewer

Service Account - for Apps and Servers

Policies are attached to resources (Policy is collection of statements)

Resource Hierarchy - Organization - folders - projects - resources

Data Loss prevention (DLP Service) - Security Practice

PII protection etc

Helps to classify data

Automatically mask data

measure re-identification risk

*InfoTypes --> Pattern detector - identify sensitive info (PII)

*Inspection jobs - applies InfoTypes to a dataset

--> API returns InfoType, Likelihood score and Location

* Risk Analysis job - find the probability that data can be reIdentified

Legal complienance - GCP is HIPAA compliant

HIPAA, HITECH (Health Info Technology of Economics & Clinical Health)

GDPR - EU regulation

Encryption @ GCP

At-Rest and in-transit

Hardware leve - AES256 or AES128 algo (highly uncrackable)

Data (say @ Colossus FS ) AES256

Encryption Key and Key-Encryption-Key (double protection)

Transit - Encrypt & Authentication

Internal GCP - not encrypted - used ATLS (App Layer Transport Security)

Internet - uses TLS or QUIC(Google developed protocol)

Key Management (KMS)

Google Managed

Customer Managed (key created by customer, managed by google) - App level encription

Customer supplied -(CSEK) - customer want complete control over the keys

Wednesday, January 13, 2021

Enterprise Architecture Framework

TOGAF (https://www.opengroup.org/togaf)

Thursday, January 7, 2021

B2B, B2C, B2B2C (B2X) business models

B2B, B2C (B2B2C - aka - B2X) applications

B2B - Tech Team, fewer users

- B2B business model, your focus is on professionals in third-party commercial organizations

- Thus the number of potential users of your application is limited

- UX less critical.

B2C - End Users, high concurrent usage, emotional factors (speed/feel/entertainment/content)

- Even millions of users can use the app simultaneously.

B2X - Engage both Providers (B2B) & End Users (B2C via B2B or directly) [booking site -Providers, end Users]

- B2B2C are modern complex platforms offering a horizontal solution to another business’s problem. In other words, your client is a third-party business that brings you access to end customers that you can also serve and engage in.

- In this case, you not only think about the wishes of your user (a company) but also about the preferences of their customers (the end-consumer).

B2B and B2C use different business models

- B2B apps often offer a subscription service to the organization.

- B2C apps are usually free of charge, but still profitable because they provide space to advertisers, sell extras in the app, or are a subscription service.

Courtesy/Reference: https://medium.com/@moqod_development/4-differences-in-b2b-and-b2c-applications-development-ed33ba025f2c

BigQuery My Reference notes

dw-bq-migration-overview MUST Read

https://cloud.google.com/solutions/migration/dw2bq/dw-bq-migration-overview

https://cloud.google.com/solutions/migration/dw2bq/dw-bq-schema-and-data-transfer-overview

ARRAY_AGG, ARRAY_LENGTH

You can do some pretty useful things with arrays like:

finding the number of elements with ARRAY_LENGTH(<array>)
deduplicating elements with ARRAY_AGG(DISTINCT <field>)
ordering elements with ARRAY_AGG(<field> ORDER BY <field>)
limiting ARRAY_AGG(<field> LIMIT 5)

SELECT
fullVisitorId, date, ARRAY_AGG(DISTINCT v2ProductName) AS products_viewed, ARRAY_LENGTH(ARRAY_AGG(DISTINCT v2ProductName)) AS distinct_products_viewed, ARRAY_AGG(DISTINCT pageTitle) AS pages_viewed, ARRAY_LENGTH(ARRAY_AGG(DISTINCT pageTitle)) AS distinct_pages_viewed FROM `data-to-insights.ecommerce.all_sessions` WHERE visitId = 1501570398 GROUP BY fullVisitorId, date ORDER BY date

Querying datasets that already have ARRAYs

In a BigQuery schema, an ARRAY field is noted as a REPEATED Mode.

SELECT visitId,  hits.page.pageTitle
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`
WHERE visitId = 1501570398

You will get an error: Cannot access field page on a value with type ARRAY<STRUCT<hitNumber INT64, time INT64, hour INT64, ...>> at [3:8]

Before you can query REPEATED fields (arrays) normally, you must first break the arrays back into rows.

How do you do that with SQL?

Answer: Use the UNNEST() function on your array field:

SELECT DISTINCT
  visitId,
  h.page.pageTitle
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`,
UNNEST(hits) AS h
WHERE visitId = 1501570398
LIMIT 10

You need to UNNEST() arrays to bring the array elements back into rows
UNNEST() always follows the table name in your FROM clause (think of it conceptually like a pre-joined table
STRUCTs

The easiest way to think about a STRUCT is to consider it conceptually like a separate table that is already pre-joined into your main table. A STRUCT can have another STRUCT as one of its fields (you can nest STRUCTs)
A STRUCT can have:

one or many fields in it
the same or different data types for each field
it's own alias

SELECT visitId, totals.*, device.*
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`
WHERE visitId = 1501570398
LIMIT 10

#standardSQL
SELECT race, participants.name
FROM racing.race_results
CROSS JOIN race_results.participants # this is the STRUCT (it is like a table within a table)

Below query will give you the same query result:

#standardSQL
SELECT race, participants.name
FROM racing.race_results AS r, r.participants

If you have more than one race type (800M, 100M, 200M), wouldn't a CROSS JOIN just associate every racer name with every possible race like a cartesian product?

Answer: No. This is a correlated cross join which only unpacks the elements associated with a single row. For a greater discussion, see working with ARRAYs and STRUCTs

#standardSQL
SELECT COUNT(p.name) AS racer_count
FROM racing.race_results AS r, UNNEST(r.participants) AS p

QUANTILES & APPROX_QUANTILES

What is Quantiles (fractiles) (100 quantile is Percentile)

Percentile -> divide the set into 100 parts...

SELECT APPROX_QUANTILES(x, 2) AS approx_quantiles
FROM UNNEST([1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x;

--> [1, 5, 10]

SELECT APPROX_QUANTILES(x, 4) AS approx_quantiles
FROM UNNEST([1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x;

--> [1, 1, 5, 8, 10]

Approx Functions in Big Query (link)

SELECT FORMAT("%T", APPROX_QUANTILES(DISTINCT x, 2 RESPECT NULLS)) AS approx_quantiles
FROM UNNEST([NULL, NULL, 1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x;

+------------------+
| approx_quantiles |
+------------------+
| [NULL, 6, 10]    |
+------------------+

SELECT APPROX_QUANTILES(x, 4) AS output
FROM UNNEST(GENERATE_ARRAY(1, 100)) AS x;

--> [1, 25, 50, 75, 100]

SELECT APPROX_QUANTILES(x, 100) AS output
FROM UNNEST(GENERATE_ARRAY(1, 200)) AS x;
--> [1, 2, 4, 6, 8, 10, 12, ....., 198, 200]

SELECT APPROX_QUANTILES(x, 100) [OFFSET (5)] AS output
FROM UNNEST(GENERATE_ARRAY(1, 200)) AS x;
 --> [10]

Approximate aggregate functions are scalable in terms of memory usage and time, but produce approximate results instead of exact results. These functions typically require less memory than exact aggregation functions like COUNT(DISTINCT ...)

----------------------------------------------------------------------------------------------

SAFE. prefix

If you begin a function with the SAFE. prefix, it will return NULL instead of an error. The SAFE. prefix only prevents errors from the prefixed function itself

SELECT SAFE.SUBSTR('foo', 0, -2) AS safe_output UNION ALL
SELECT SAFE.SUBSTR('bar', 0, 2) AS safe_output;

+-------------+
| safe_output |
+-------------+
| NULL        |
| ba          |
+-------------+

If no SAFE. prefix used --> query fails with error "Third argument in SUBSTR() cannot be negative"

BigQuery- Query Cache

Cache results of previous queries

The BigQuery service automatically caches query results in a temporary table. If the identical query is submitted within approximately 24 hours, the results are served from this temporary table without any recomputation. Cached results are extremely fast and do not incur charges.

There are, however, a few caveats to be aware of. Query caching is based on exact string comparison. So even whitespaces can cause a cache miss.

Queries are never cached if they exhibit non-deterministic behavior (for example, they use CURRENT_TIMESTAMP or RAND), if the table or view being queried has changed (even if the columns/rows of interest to the query are unchanged), if the table is associated with a streaming buffer (even if there are no new rows), if the query uses DML statements, or queries external data sources.

WITH CLAUSE (Common Table Expression)

WITH clause (also called a Common Table Expression) improves readability but does not improve query speed or cost since results are not cached.

The same holds for views and subqueries as well

if used frequently, one way to potentially improve performance is to store the result into a table (or materialized view).

BIG QUERY - VERY RELEVANT INFO

https://googlecourses.qwiklabs.com/course_sessions/107473/labs/25818

_TABLE_SUFFIX (WildCard table reference)

#standardSQL

CREATE OR REPLACE TABLE ecommerce.days_with_rain

PARTITION BY date

OPTIONS (

partition_expiration_days=60,

description="weather stations with precipitation, partitioned by day"

) AS

SELECT

DATE(CAST(year AS INT64), CAST(mo AS INT64), CAST(da AS INT64)) AS date,

(SELECT ANY_VALUE(name) FROM `bigquery-public-data.noaa_gsod.stations` AS stations

WHERE stations.usaf = stn) AS station_name, -- Stations may have multiple names

prcp

FROM `bigquery-public-data.noaa_gsod.gsod*` AS weather

WHERE prcp < 99.9 -- Filter unknown values

AND prcp > 0 -- Filter

AND CAST(_TABLE_SUFFIX AS int64) >= 2017

AND CAST(_TABLE_SUFFIX AS int64) <= 2019

BQML - Big Query Machine Learning

CREATE OR REPLACE MODEL

bike_model.model_bucketized TRANSFORM(* EXCEPT(start_date),

(EXTRACT(dayofweek

FROM

start_date) BETWEEN 2 AND 6,

'weekday',

'weekend') AS dayofweek,

ML.BUCKETIZE(EXTRACT(HOUR

FROM

start_date),

[5, 10, 17]) AS hourofday )

OPTIONS

(input_label_cols=['duration'],

model_type='linear_reg') AS

SELECT

duration,

start_station_name,

start_date

FROM

`bigquery-public-data`.london_bicycles.cycle_hire

EVALUATE

select * from ML.EVALUATE(<dataset>.<model_name>) or
select * from ML.EVALUATE(<dataset>.<model_name>, 'SQL query to provide input sample data')

SELECT * FROM ML
ML.PREDICT(MODEL bike_model.model_bucketized,
( SELECT 'Park Lane , Hyde Park' AS start_station_name, CURRENT_TIMESTAMP() AS start_date) )

MODEL WEIGHTS

A linear regression model predicts the output as a weighted sum of its inputs. Often times, the weights of the model need to be utilized in a production environment.
SELECT * FROM ML.WEIGHTS(MODEL bike_model.model_bucketized)

BQ Create, Load commands

bq --location=EU mk --dataset movies

bq load --source_format=CSV

--location=EU
--autodetect movies.movielens_movies_raw gs://dataeng-movielens/movies.csv

To replace a formatted String ( like, a | separated string into array)

SELECT * REPLACE(SPLIT(genres, "|") AS genres)

FROM movies.movielens_movies_raw WHERE movieId < 5

Collaborative Filtering

Matrix factorization is a collaborative filtering technique that relies on two vectors called the user factors and the item factors. The user factors is a low-dimensional representation of a user_id and the item factors similarly represents an item_id.

To perform a matrix factorization of our data, we use the typical BigQuery ML syntax except that the model_type is matrix_factorization and we have to identify which columns play what roles in the collaborative filtering setup.

CREATE OR REPLACE MODEL movies.movie_recommender
OPTIONS (model_type='matrix_factorization', user_col='userId', item_col='movieId', rating_col='rating', l2_reg=0.2, num_factors=16) AS
SELECT userId, movieId, rating
FROM movies.movielens_ratings

ML.PREDICT
SELECT * FROM
ML.PREDICT(MODEL `cloud-training-prod-bucket.movies.movie_recommender`,
( SELECT movieId, title, 903 AS userId
FROM
`movies.movielens_movies`, UNNEST(genres) g
WHERE g = 'Comedy' ))
ORDER BY redicted_rating DESC

LIMIT 5

Ref: https://googlecourses.qwiklabs.com/ course_sessions /109507 /labs/ 12073

Materialized VIEW - Automatic refresh (can turn on/off)

By default, materialized views are automatically refreshed within 5 minutes of a change to the base table. Examples of changes include row insertions or row deletions.

Automatic refresh can be enabled or disabled at any time.

To turn automatic refresh off when you create a table, set enable_refresh to false.

CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
PARTITION BY RANGE_BUCKET(column, buckets)
OPTIONS (enable_refresh = false)
AS SELECT ...For an existing materialized view, you can modify the enable_refresh value using ALTER MATERIALIZED VIEW.

ALTER MATERIALIZED VIEW project-id.my_dataset.my_mv_table
SET OPTIONS (enable_refresh = true)

Query Data - options, Federated Query... 
SQL GUI
BQ
Storage API --> Spark, Tensorflow, Dataflow, Pandas, Scikit-learn

Controlling access to DataSets
       https://cloud.google.com/bigquery/docs/dataset-access-controls 
You can apply access controls during dataset creation by calling the datasets.insert API method.
Access controls can't be applied during dataset creation in the Cloud Console or the bq command-line tool.
You can apply access controls to a dataset after it is created in the following ways:
Using the Cloud Console.
Using the bq update command in the bq command-line tool.
Calling the datasets.patch API method.
Using the client libraries.

Dataset Sharing - option in UI [share dataset]


Partitioning

https://cloud.google.com/bigquery/docs/querying-partitioned-tables

   By ingestion time 
        When you create an ingestion-time partitioned table, two pseudo columns         are added to the table: a _PARTITIONTIME pseudo column and                 a _PARTITIONDATE pseudo column. 
        When you query data in ingestion-time partitioned tables, you reference                 specific partitions by specifying the values in                 the _PARTITIONTIME or _PARTITIONDATE pseudo columns. For example:
_PARTITIONTIME >= "2018-01-29 00:00:00" AND _PARTITIONTIME < "2018-01-30 00:00:00"
_PARTITIONTIME BETWEEN TIMESTAMP('2016-01-01') AND TIMESTAMP('2016-01-02')
         Limiting partitions queried using pseudo columns  column
FROM
  dataset.table
WHERE
  _PARTITIONTIME BETWEEN TIMESTAMP('2016-01-01')
  AND TIMESTAMP('2016-01-02')

  Partition by Date/Time
         No pseudo columns here

        Special partitions - 
          _NULL_ if value is null
          _UNPARTITIONED_ (if value is outside the allowed range)


Integer Range Partitioning

Using the Cloud Console
Using a DDL CREATE TABLE statement with a PARTITION BY RANGE_BUCKET clause that contains a partition expressionRANGE_BUCKET (
  <integer_column>, 
  GENERATE_ARRAY(<beginning>, <end + 1>, <interval_length>)
)

* or in BQ command use 
--range_partitioning=pickup_location_id,0,300,5
Argument Value
column name customer_id
start 0
end 100
interval 10
The table will be partitioned on the customer_id column into ranges of interval 10. The values 0 to 9 will be in one partition, values 10 to 19 in another partition, ..., and finally values 90 to 99 will be in another partition. Values outside of 0 to 99 (such as -1 or 100) will be in the UNPARTITIONED partition. Null values will be in the NULL partition. (pseudo columns _)UNPARTITIONED__, __NULL__


Sharding & partitioning

Partitioning is preferred - less metadata, better performance, less permission checks

Sharding - create separate table for each day/hour as u set as below
<TABLE_NAME_PREFIX>_<YYMMDD>

https://www.yuichiotsuka.com/bigquery-table-partition/#Table_Sharding

When querying a date-sharded table, you only include the table(s) that you need. You can use either a UNION ALL, or a wildcard table format.

SELECT
*
FROM 
  `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`
WHERE 
  _TABLE_SUFFIX BETWEEN '01' AND '14'


Use union in queries to scan multiple tables to get the result.

Clustering - to keep like data together 
- suppose if you want to search on a TEXT column

Argument	Value
column name	customer_id
start	0
end	100
interval	10

CREATE TABLE <my_partitioned_table>
PARTITIONED BY DATE(creation_date)
CLUSTER BY deviceName
AS
SELECT * from <Non Partitioned Table>

Data Analytics - Declarative, Predictive, Prescriptive

Sunday, January 17, 2021

####### Introduction to Angular ######

##Angular (ng) High-level info (Angular 7)

################ Basic Concepts and Binding ##########

## Component

## Component Lifecycle

## Introduction to Routes

## Services in Angular ##

## Route Parameters

Thursday, January 14, 2021

ML Categories

2 approaches to ML

** Symbolic Artificial Intelligence

Symbolic ML Algorithms

** Neural networks and deep learning

Deep Learning (more than 3 layers)

Entity & Attributes

Feature Engineering

** Ways to do feature engineering

Model Building

Evaluating Model

Confusion Metrix - Actual x Predicted

Deep Learning

BackPropagation

Model Troubleshooting

Bias - Variance Tradeoff

Wednesday, January 13, 2021

Enterprise Architecture Framework

TOGAF (https://www.opengroup.org/togaf)

Thursday, January 7, 2021

dw-bq-migration-overview MUST Read

ARRAY_AGG, ARRAY_LENGTH

Querying datasets that already have ARRAYs

Before you can query REPEATED fields (arrays) normally, you must first break the arrays back into rows.

How do you do that with SQL?Answer: Use the UNNEST() function on your array field:SELECT DISTINCT visitId, h.page.pageTitle FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`, UNNEST(hits) AS h WHERE visitId = 1501570398 LIMIT 10

You need to UNNEST() arrays to bring the array elements back into rowsUNNEST() always follows the table name in your FROM clause (think of it conceptually like a pre-joined tableSTRUCTs

SELECT visitId, totals.*, device.*

FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`

WHERE visitId = 1501570398

LIMIT 10

#standardSQL

SELECT race, participants.name

FROM racing.race_results

CROSS JOIN race_results.participants # this is the STRUCT (it is like a table within a table)

Below query will give you the same query result:

#standardSQL SELECT race, participants.name FROM racing.race_results AS r, r.participants

If you have more than one race type (800M, 100M, 200M), wouldn't a CROSS JOIN just associate every racer name with every possible race like a cartesian product?

#standardSQL SELECT COUNT(p.name) AS racer_count FROM racing.race_results AS r, UNNEST(r.participants) AS p

QUANTILES & APPROX_QUANTILES

Approx Functions in Big Query (link)

Read more here --> BigQuery Functions and Operators

SAFE. prefix

BigQuery- Query Cache

Cache results of previous queries

WITH CLAUSE (Common Table Expression)

BIG QUERY - VERY RELEVANT INFO

_TABLE_SUFFIX (WildCard table reference)

BQML - Big Query Machine Learning

EVALUATE

MODEL WEIGHTS

BQ Create, Load commands

To replace a formatted String ( like, a | separated string into array)

Collaborative Filtering

ML.PREDICT

Materialized VIEW - Automatic refresh (can turn on/off)

Query Data - options, Federated Query...

SQL GUIBQStorage API --> Spark, Tensorflow, Dataflow, Pandas, Scikit-learn

Controlling access to DataSets

SELECT columnFROM dataset.tableWHERE _PARTITIONTIME BETWEEN TIMESTAMP('2016-01-01') AND TIMESTAMP('2016-01-02')

Data Analytics - Declarative, Predictive, Prescriptive

How do you do that with SQL?
Answer: Use the UNNEST() function on your array field:
SELECT DISTINCT visitId, h.page.pageTitle FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`, UNNEST(hits) AS h WHERE visitId = 1501570398 LIMIT 10

You need to UNNEST() arrays to bring the array elements back into rows
UNNEST() always follows the table name in your FROM clause (think of it conceptually like a pre-joined table
STRUCTs

SELECT visitId, totals., device.

`#standardSQL SELECT race, participants.name FROM racing.race_results AS r, r.participants`

`#standardSQL SELECT COUNT(p.name) AS racer_count FROM racing.race_results AS r, UNNEST(r.participants) AS p`

SQL GUI
BQ
Storage API --> Spark, Tensorflow, Dataflow, Pandas, Scikit-learn

`column`
FROM
`dataset.table`
WHERE
_PARTITIONTIME BETWEEN TIMESTAMP('2016-01-01')
AND TIMESTAMP('2016-01-02')