Database migration
AWS - DMS (homogenious and heterogenious source/sink)
Can migrate from RDBMS to DynamoDB, or MongoDB to DynamoDB etc.
CDC - Change Data Capture (or Continous Data Conversion as in AWS)
SCT - Schema Conversion Tool (for heterogenious migration)
RDBMS to DynamoDB migration approaches (AWS doc)
1) Using AWS DMS
2) Use EMR, Amazon Kinesis, and Lambda with custom scripts
Can possibly use DataSync agent to copy data from onPrem to S3
MySQL binlog (cdc ?)
CCreate DMS instance (on EC2), define source and destination endpoints , create migration tasks
To map data to a DynamoDB target, you use a type of table-mapping rule called object-mapping
Caching on AWS
EMR
Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto. Amazon EMR makes it easy to set up, operate, and scale your big data environments by automating time-consuming tasks like provisioning capacity and tuning clusters.
With EMR you can run petabyte-scale analysis at less than half of the cost of traditional on-premises solutions and over 3x faster than standard Apache Spark.
You can run workloads on Amazon EC2 instances, on Amazon Elastic Kubernetes Service (EKS) clusters, or on-premises using EMR on AWS Outposts.
Master Node, Core Node (Data), Task Node (No data, optional)
MasterNode - Single Point of Failure (can setup to save the log in S3, on cluster setup)
AWS Directory Service (like Active Directory)
Connects AWS resources with onPrem AD (AD info below)
ARN - Amazon Resource Name
IAM Policy JSON structure (attach it to a Role; then attach the role to an account or resource
IAM Permissions Boundary - restrict access
Resource Access Manager (RAM)
SSO - Use one context to login to another using SAML (Security Assertion M L)
DNS
Top level, 2nd level domains
Domain Registrar - WHO Database - SOA Record
NS - Name Server Records
A Record - name to IP address
CName (Canonical name) - resolve one domain address to another (like m.<domain>)
A Canonical Name or CNAME record is a type of DNS record that maps an alias name to a true or canonical domain name.
CNAME records are typically used to map a subdomain such as www or mail to the domain hosting that subdomain’s content. For example, a CNAME record can map the web address www.example.com to the actual web site for the domain example.com.
Alias Records - map resource record set in the hosted zone to ELB, Cloud Front, S3 static website.
Routing Policies
Simple routing policy - 1 A record with multiple IP
Weighted Routing - multiple A records (IP) with different weights (healthcheck if ?)
Latency - latency to the region makes the routing decision
Failover - active/passive setup - add health check - which is based on public IP which changes on restart - so make sure you update health check or use dedicated IP)
GeoLocation - based on user location
GeoProximity - complex rules (traffic only) - ignore
Multivalue Answer - Simple Routing with separate IP with health check
VPC
Private IP address range by IANA
& Amazon restricts CIDR block larger than /16 - means first 16 of 32 bits are masked - 255.255.x.x) /16 netmask
min /28 - 16IP addresses (4 bits) -
With new VPC - whats default & not.
- by default Route Table, NACL & SG (security group) created
- by default NO Subnet, no IG.
Special note - Security Group (SG) - default SG will have an Inbound rule to allow any traffic from the same SG only and outbound rule allowing any traffic to the Internet (outside world) - so if the subnet is public - can connect to the internet from the resource. Note- SGs are stateful (NACL is not)- even if the outbound rule is removed, if inbound is allowed, it can reply back to (outbound) for the same.
if you create a new SG - everything is blocked - there wont be any inbound rules (add manually as needed), but outbound will be open to all
SG can only "Allow" no "Deny"/block option - NACL has.
Can attach multiple SG to EC2/resources
Create Subnet (it cant span multiple AZ)
for one - modify auto-Assign public IP
Reserved IP addresses (5 are reserved )
Create Internet Gateway - and attach to the VPC (its HA)
Configure Route table - Routes, Subnet Association
Default route table (Main) - no public access by default. all subnets will be associated with this by default. (So dont add public route to Main route table)
So create a new Route table and make it as Public by adding a new route out to the internet (from 0.0.0.0/0 to IG) - and associate the subnet which needs to be Public to this route table
<Always keep Main Route table as Private (by not adding a route out to the internet) and use separate public route table>
create instance one in public and one in private subnet - one will have public IP
NACL inbound and outbound rules (default)
ACL - Rule# increaments of 100 (100, 200 .... & 101, 201 for IPv6)
New custom NACL - denies everything inbound & outbound
Rule is evaluated in tthe cronological order of the rule#.
So keep deny before allow to take its effect
Load Balancer - at least 2 public subnets are required (2 AZ for HA)
VPC Flowlogs - all traffic in/out of VPC - stored using Cloud Watch (VPC level, subnet level, network interface level)
Basin Host
How to communicate to a Private instance?
NAT instances (1) and NAT Gateways (HA) (Network Address Translation
Create NAT instance (EC2 NAT AMI), disable source/destination check
Then add a route in Main route table to allow internet access via NAT instance
Single point of failure -- so use NAT Gateway
Create new NAT gateway on the public subnet, create an elastic IP (uses ephemeral port) - then add route
Elastic Search - not just search, but analytics - massive scale, near-realtime, cheap (v7.9)
ELK (ES, Logstash (bring data in - pipeline), Kibana (visualize))
Document storage and retreival engine (Scaled Lucene engine)
Document (text, json)- documentId, types (schema & mapping - going away), indices (inverted indices)
Documents hashed to separate shards (shard - self contained lucene index - kind of mini search engine by itself)
Primary, replica nodes (write - to primary & then replicated)
Elastic Search Sercice - Managed Service (not serverless) (avoid installing n mamaging ES on EC2)
instance hours will cost always..
IoT --> ES for analysis - possible
Need to choose # of master nodes
Domains -> in ES means Cluster
Snapshot to S3 can be set
Login to Kibaba ( onPrem - internet - Kibaba within the VPC) - use Cognito (create cognito user pool if needed)
Kinesis - processing via Lambda
There are several blueprints @ Lambda - search for something to convert APACHE Access Log
copy
index.js code
this converts apache log to JSON format
Kinesis firehose destination
Elastic Search - (APM - Application Performance managenet ) - Analyze application logs and system matrics
Predicting trend (# of calls etc) - via graphical representation
Anomany detection
Data is stored as documents (like row in RDBMS) with fields/values
Query using rest API
Logstash - if you want to bring data to ES and needs data enrichment prior that
If more NODES are added - the SHARDS are distributed evenly by ES.
ES Type/Index
'
Query String API
_search API,
field=pasta
AWS - ES Domain comes with Kibana by default (if selected on setup)
set proper access to use it.
Kibana - create index pattern - first give the domain name , then timestamp.
Discover, Visualize, Dashboard....
Notebooks - Jupyter, Zeppelin
Athena - Glue - QuickSight
Glue - Crawler - from S3/JDBC/DynamoDB - will crawl n create table in your database in Glue.
Schema - auto detected if it is in header, else edit once done.
Athena - SQL like query using the Glue Catalog.
RedShift
client --> (jdbc/odbc) - Leader Node, Compute Node (1-128) (- multilple Node Slides
Compute Node -> Dense Storage/Desnse Compute
DynamoDB - common usecases
SQS vs Kinesis Data Streams
SQS, Kinesis DataStream, Kinesis Firehose, SQS FIFO
IoT (internet of things)
Thing Registry, Device Gateway, IoT message Broker,, IoT Rules Engine, Device Shadow
---> Kinesis, SQS, Lambda, DynamoDB, S3, SNS, ES, MQTT to ML model.. ...
IoT Greengrass - bring compute power (lambda) on the device
VPC Peering, VPC Private Link
if there are many VPCs, peering to each is a big task mnage multile peering relations. so use private link - using Network Load Balancer nd Elastic Network interface (ENI)
Direct connect
VPC gateway (dont go thro internet) - Gateway (only S3 & DynamoDB)/Interface
Cloud Formation - stack
crete from template or create new template in Designer
Designer preview
Amazon Managed Service for Grafana
Powerful, interactive data visualizations for builders, operators, and business leaders