Thursday, January 14, 2021

Google Cloud Data Engineer - My Quick Notes 1 (few services, IAM, Security)

 


MemoryStore - Redis and MemCached

App Engine - Standard and Flexible environment

Cloud Composer - AirFlow - Architecture

Tenant Project - (AirFlow Database (Cloud SQL) to store metadata, Web Service (App Engine), 

Customer Project - GKE  with AirFlow Worker, 

Redis (persist message across container restarts), 

AirFlow Scheduler and Cloud SQL Proxy, 

Cloud Storage (staging the DAG, logs etc) 

Cloud Data Fusion (ETL tool) - based on CDAP dta analytics platform

Execution environment - instance

Basic - visual designer, transformations, SDK and 

Enterprise: Basic + Streaming , metadata repo, HA, triggers, schedulers


Pub/Sub - Subscription - pull / pull -  for Async Integrations

Multi-Cloud environment - avoid single vendor lock-instance

Anthos 

    - Run workloads in Kubernetes cluster

- Multi-Cloud Application Mordernization platform

- Anthos is a managed application platform that extends Google Cloud services and engineering practices

to your environments so you can modernize apps faster and establish operational consistency across them.

- Build, deploy, and optimize applications anywhere—simply, flexibly, and securely

- Consistent development and operations experience for hybrid and multi-cloud environments

- Achieve up to 4.8x ROI within 3 years according

Cloud Code

Cloud Build

Dashboards & Visualizations, Metrics explorer, Uptime checks, Alerts, Resource Usage page.

- Alerting - Create Policy - add conditions, Notification channels - can be Email, Pub/Sub, Pager, slack, SMS, Console(mob), Pager, campfire

- Cloud logging - using FluentD agent - fully managed services - 30 days - or export it (using Cloud Router)

- Cloud Router - create Sink for the logs to flow in to - to BigQuery, Cloud Storage, Pub/Sub, Custom destination

- Installing Monitoring Agent on VMs --> downlod & install the stackdriver-agent (apt-get) and start the services (sudo service stackdriver-agent start)


IAM - principle of least privilege.

G-Suit Account

- Domain registration - Hosting company provides an email address in that domain

- Google provides such an option - can use existing domain or create new domain name

- domain name as ‘yoursitename.com’. Now, the emails will look like ‘john@yoursitename.com’. 

Google Groups - easier to provide/remove access to users

Members - Users (will have one or more roles attached) - can be individual ids, service accounts, GSuite or google groups

Roles - Predefined, Custom and Primitive - attached to identities

Pre-defined Roles (with many pre-defined permissions)

There is a big list of roles pre defined like <service>-admin, viewer, manager Admin,                             reader etc

Sample 

role: <serviceName>.<genericRoleName> --> Big Query Data Owner

BigQuery Data Owner --> has many permissions attached - like                 bigquery.dataset.create, bigquery.models.create, bigquery.table.delete .... etc

BigQuery Data viewer --> many permissions - all are like .get, .list, .export etc                                         -   basically read only ones

Custom Roles:

Example: Big Query Data Owner but with no model (ML model) access

--> "CREATE FROM ROLE" option is best way name it as "Big Query Data Owner - No Model"

Primitive Roles

Owner (can setup billing for a project), Editer and Viewer

Service Account - for Apps and Servers


Policies are attached to resources (Policy is collection of statements)

Resource Hierarchy - Organization - folders - projects - resources


Data Loss prevention (DLP Service) - Security Practice

PII protection etc

Helps to classify data

Automatically mask data

measure re-identification risk

*InfoTypes --> Pattern detector - identify sensitive info (PII)

*Inspection jobs - applies InfoTypes to a dataset 

--> API returns InfoType, Likelihood score and Location

* Risk Analysis job - find the probability that data can be reIdentified

Legal complienance - GCP is HIPAA compliant


HIPAA, HITECH (Health Info Technology of Economics & Clinical Health)

GDPR - EU regulation


Encryption @ GCP

At-Rest and in-transit

Hardware leve - AES256 or AES128 algo (highly uncrackable)

Data (say @ Colossus FS ) AES256

Encryption Key and Key-Encryption-Key (double protection)

Transit - Encrypt & Authentication

Internal GCP - not encrypted - used ATLS (App Layer Transport Security)

Internet - uses TLS or QUIC(Google developed protocol)

Key Management (KMS)

Google Managed

Customer Managed (key created by customer, managed by google) - App level encription

Customer supplied -(CSEK) - customer want complete control over the keys


No comments:

Post a Comment