Thursday, February 4, 2021

Solutions Architect notes

 

Database migration

AWS - DMS (homogenious and heterogenious source/sink)

Can migrate from RDBMS to DynamoDB, or MongoDB to DynamoDB etc.

CDC - Change Data Capture (or Continous Data Conversion as in AWS)

 SCT - Schema Conversion Tool (for heterogenious migration)

RDBMS to DynamoDB migration approaches (AWS doc)

 1) Using AWS DMS 

2) Use EMR, Amazon Kinesis, and Lambda with custom scripts

 Can possibly use DataSync agent to copy data from onPrem to S3

 MySQL binlog (cdc ?)

CCreate DMS instance (on EC2), define source and destination endpoints , create migration tasks

To map data to a DynamoDB target, you use a type of table-mapping rule called object-mapping

Caching on AWS           


 EMR

Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open source tools such as Apache SparkApache HiveApache HBaseApache FlinkApache Hudi, and Presto. Amazon EMR makes it easy to set up, operate, and scale your big data environments by automating time-consuming tasks like provisioning capacity and tuning clusters.

With EMR you can run petabyte-scale analysis at less than half of the cost of traditional on-premises solutions and over 3x faster than standard Apache Spark 

You can run workloads on Amazon EC2 instances, on Amazon Elastic Kubernetes Service (EKS) clusters, or on-premises using EMR on AWS Outposts.

           Master Node, Core Node (Data), Task Node (No data, optional)

MasterNode - Single Point of Failure (can setup to save the log in S3, on cluster setup)

AWS Directory Service (like Active Directory)

Connects AWS resources with onPrem AD  (AD info below)

        ARN - Amazon Resource Name



IAM Policy JSON structure (attach it to a Role; then attach the role to an account or resource




IAM Permissions Boundary - restrict access


Resource Access Manager (RAM)
SSO

SSO - Use one context to login to another using SAML (Security Assertion M L)

DNS 

Top level, 2nd level domains 

Domain Registrar - WHO Database - SOA Record

NS - Name Server Records


         A Record -     name to IP address


         CName (Canonical name) - resolve one domain address to another (like m.<domain>) 

A Canonical Name or CNAME record is a type of DNS record that maps an alias name to a true or canonical domain name. 

CNAME records are typically used to map a subdomain such as www or mail to the domain hosting that subdomain’s content. For example, a CNAME record can map the web address www.example.com to the actual web site for the domain example.com.

         Alias Records - map resource record set in the hosted zone to ELB, Cloud Front, S3 static website.

Routing Policies

Simple routing policy - 1 A record with multiple IP

Weighted Routing - multiple A records (IP) with different weights (healthcheck if ?)

Latency - latency to the region makes the routing decision

Failover - active/passive setup - add health check - which is based on public IP which changes on restart - so make sure you update health check or use dedicated IP)

          GeoLocation - based on user location 

            GeoProximity  - complex rules (traffic only) - ignore

Multivalue Answer - Simple Routing with separate IP with health check

VPC



Private IP address range by IANA 
& Amazon restricts CIDR block larger than /16 - means first 16 of 32 bits are masked - 255.255.x.x)    /16 netmask
min /28 - 16IP addresses (4 bits) - 

With new VPC - whats default & not.
- by default  Route Table, NACL & SG (security group) created

         -  by default NO Subnet, no IG.


Special note - Security Group (SG) - default SG will have an Inbound rule to allow any traffic from the same SG only and outbound rule allowing any traffic to the Internet (outside world) -  so if the subnet is public - can connect to the internet from the resource.  Note- SGs are stateful (NACL is not)- even if the outbound rule is removed, if inbound is allowed, it can reply back to (outbound) for the same.

if you create a new SG - everything is blocked - there wont be any inbound rules (add manually as needed), but outbound will be open to all
SG can only "Allow" no "Deny"/block option - NACL has.

Can attach multiple SG to EC2/resources

Create Subnet (it cant span multiple AZ)
 for one - modify auto-Assign public IP

Reserved IP addresses (5 are reserved )


Create Internet Gateway - and attach to the VPC (its HA)

Configure Route table - Routes, Subnet Association

Default route table (Main) - no public access by default. all subnets will be associated with this by default. (So dont add public route to Main route table)


So create a new Route table and make it as Public by adding a new route out to the internet (from 0.0.0.0/0 to IG)  - and associate the subnet which needs to be Public to this route table

<Always keep Main Route table as Private (by not adding a route out to the internet) and use separate public route table>

create instance one in public and one in private subnet - one will have public IP 

 

NACL inbound and outbound rules (default)



   ACL - Rule# increaments of 100 (100, 200 .... & 101, 201 for IPv6)
   
   New custom NACL - denies everything inbound & outbound
   Rule is evaluated in tthe cronological order of the rule#. 
   So keep deny before allow to take its effect


Load Balancer - at least 2 public subnets are required (2 AZ for HA)
VPC Flowlogs - all traffic in/out of VPC - stored using Cloud Watch (VPC level, subnet level, network interface level)

Basin Host







How to communicate to a Private instance?
NAT instances (1)  and NAT Gateways (HA) (Network Address Translation

     Create NAT instance (EC2 NAT AMI), disable source/destination check

      Then add a route in Main route table to allow internet access via NAT instance 

       Single point of failure -- so use NAT Gateway

        Create new NAT gateway on the public subnet, create an elastic  IP (uses ephemeral port) - then add route



        


 


Elastic Search - not just search, but analytics - massive scale, near-realtime, cheap (v7.9)
    
    ELK (ES, Logstash (bring data in - pipeline), Kibana (visualize))
    
    Document storage and retreival engine (Scaled Lucene engine)
   
     Document (text, json)- documentId, types (schema & mapping - going away), indices (inverted indices)

     Documents hashed to separate shards (shard - self contained lucene index - kind of mini search engine by itself)

      Primary, replica nodes (write - to primary & then replicated)

      Elastic Search Sercice - Managed Service (not serverless)  (avoid installing n          mamaging ES on EC2)
   
      
instance hours will cost always.. 

    IoT --> ES for analysis -  possible
    
     Need to choose # of master nodes
     Domains -> in ES means Cluster
     Snapshot to S3 can be set
     Login to Kibaba ( onPrem - internet - Kibaba  within the VPC) - use Cognito (create cognito user pool if needed)

          



Kinesis - processing via Lambda
There are several blueprints @ Lambda - search for something to convert APACHE Access Log
copy index.js code

this converts apache log to JSON format

Kinesis firehose destination

Elastic Search - (APM - Application Performance managenet ) - Analyze application logs and system matrics 
Predicting trend (# of calls etc) - via graphical representation
Anomany detection
Data is stored as documents (like row in RDBMS) with fields/values
Query using rest API
Logstash - if you want to bring data to ES and needs data enrichment prior that

If more NODES are added - the SHARDS are distributed evenly by ES.

ES Type/Index 
'
Query String API
_search API, 


all fileds

field=pasta
AWS - ES Domain comes with Kibana by default (if selected on setup)
set proper access to use it.

Kibana - create index pattern - first give the domain name , then timestamp.
Discover, Visualize, Dashboard....


Notebooks - Jupyter, Zeppelin

Athena - Glue - QuickSight
Glue - Crawler - from S3/JDBC/DynamoDB - will crawl n create table in your database in Glue.
Schema - auto detected if it is in header, else edit once done.
Athena - SQL like query using the Glue Catalog.

RedShift
    client --> (jdbc/odbc) - Leader Node, Compute Node (1-128) (- multilple Node Slides
Compute Node -> Dense Storage/Desnse Compute



DynamoDB - common usecases

SQS vs Kinesis Data Streams
SQS, Kinesis DataStream, Kinesis Firehose, SQS FIFO

IoT (internet of things) 
   Thing Registry, Device Gateway, IoT message Broker,, IoT Rules Engine, Device     Shadow  
   ---> Kinesis, SQS, Lambda, DynamoDB, S3, SNS, ES, MQTT to ML model.. ...

IoT Greengrass - bring compute power (lambda) on the device

VPC Peering, VPC Private Link
if there are many VPCs, peering to each is a big task mnage multile peering relations. so use private link - using Network Load Balancer nd Elastic Network interface (ENI)
Direct connect


VPC gateway (dont go thro internet) - Gateway (only S3 & DynamoDB)/Interface

Cloud Formation - stack
crete from template or create new template in Designer
Designer preview
Amazon Managed Service for Grafana
Powerful, interactive data visualizations for builders, operators, and business leaders









































































  

     


 

 

 
 
















 



 

 


 









 


 

 


 

 

 

 


No comments:

Post a Comment