Thursday, February 4, 2021

AWS Big Data related notes


DMS 

EndPoint - Connection error for S3 source endpoint - Test Endpoint failed: Application-Status: 1020912, Application-Message: Failed to connect to database.

Root Cause: DMS Replication Instance and the S3 bucket were not in the same region.

          Migration process - 

1) create source & target endpoints and test the connection

2) create migration task using the endpoints (map columns as required) and run it. 

https://www.youtube.com/watch?v=_wb-IupX9JU 

 https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.S3.html

ignoreHeaderRows=1 to ignore header in data

Load Task completed - but no data loaded >> look at the "Bucket folder" value provided in the endpoint --> it should point until the <schema>/<table>/data.csv path.

So, if the sub dirs for schema & table are directly under the bucket - keep it empty 

https://aws.amazon.com/premiumsupport/knowledge-center/dms-task-successful-no-data-s3/  


Taxi_trips

https://www.kaggle.com/divineunited/exploring-the-chicago-taxi-trip-dataset 

https://data.cityofchicago.org/Transportation/Taxi-Trips/wrvz-psew/data 

Data types etchttps://docs.aws.amazon.com/dms/latest/userguide/dms-ug.pdf

CloudWatch logs not written - says Log Group not available or log stream not available - why ?

For DMS to write CloudWatch logs - there should be a Role named  "dms-cloudwatch-logs-role" with "AmazonDMSCloudWatchLogsRole" policy attached to it. If its not there, create manually

Refer: https://aws.amazon.com/premiumsupport/knowledge-center/dms-cloudwatch-logs-not-appearing/ 

Important Note: 

"ColumnNullable" is true by default even if not mentioned in the Schema JSON. Which means, if any column value is empty/null, job will fail at that point.

Data is loaded to RDS one by one (committed one by one sequnetially) - So even if job fails on data error on a particular row, all rows till then will be loaded, and remaining will be skipped.

So, analyze the data before the load - to mark all nullable columns with schema attribute 

"ColumnNullable":"true"

Replication Instance, EndPoints & Replication Tasks


 


 

S3 Data Dir Structure 

  1. Amazon S3
  2. srees-data-bucket-2
  3. publicdata/


  4.  S3 Data to load to RDS (MySQL)

EndPoint settings

Extra connection attributes : bucketName=srees-data-bucket-2;cdcPath=undefined;compressionType=NONE;csvDelimiter=,;csvRowDelimiter=\n;datePartitionEnabled=false;ignoreHeaderRows=1; 

 Table structure (JSON)

{
    "TableCount":"1",
    "Tables": [
        {
            "TableName":"taxi_trips",
            "TablePath":"publicdata/taxi_trips_202007/",
            "TableOwner":"publicdata",
            "TableColumns": [
                {
                    "ColumnName":"trip_id",
                    "ColumnType":"STRING",
                    "ColumnNullable":"false",
                    "ColumnIsPk":"true",
                    "ColumnLength":"50"
                },{
                    "ColumnName":"taxi_id",
                    "ColumnType":"STRING",
                    "ColumnNullable":"false",
                    "ColumnIsPk":"false",
                    "ColumnLength":"150"
                }, ................................................... 
                {
                    "ColumnName":"trip_miles",
                    "ColumnType":"NUMERIC",
                    "ColumnPrecision":"6", --> TOTAL SIZE
                    "ColumnScale":"2", --> #DECIMAL PLACES
                    "ColumnNullable":"true"
                }, 
 .....................................................
                {   "ColumnName":"dropoff_location",
                    "ColumnType":"STRING",
                    "ColumnLength":"30",
                    "ColumnNullable":"true"
                }
            ],
            "TableColumnsTotal":"23"
        }
    ]

Cloud Watch Logs - sample


 

 



 

 

 

AWS API Gateway

Issue faced while invoking it via JQuery:

Access to XMLHttpRequest at 'https://dummyCode.execute-api.us-west-1.amazonaws.com/dev/contactMeToCallSES' from origin 'null' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. 

Enable it in API Gateway - 

Choose Enable CORS from the Actions drop-down menu.

CORS is required to call your API from a webpage that isn’t hosted on the same domain. To enable CORS for a REST API, set the Access-Control-Allow-Origin header in the response object that you return from your function code.

Access to XMLHttpRequest at 'https://2491zdgh7a.execute-api.us-west-1.amazonaws.com/dev/contactMeToCallSES' from origin 'null' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.

*** Make sure to deploy the resource after this change


***** Problem I faced with APIGateway-Lambda integration - requests were failing *****
API Test from UI (and from postman) was keep on failing. 
The difference b/w the working (old account) & the non working version of my API was that - the value of Endpoint request body after transformations was having all headers and other params in the failing request , instead of the actual JSON request body

Sun Feb 14 18:31:38 UTC 2021 : Endpoint request body after transformations: {"resource":"/contactMeToCallSES","path":"/contactMeToCallSES","httpMethod":"POST","headers":null,"multiValueHeaders":null,"queryStringParameters":null,"multiValueQueryStringParameters":null,"pathParameters":null,"stageVariables":null,"requestContext":{"resourceId":"t90fph","resourcePath":"/contactMeToCallSES","httpMethod":"POST","extendedRequestId":"av3lqHDCSK4FtNA=","requestTime":"14/Feb/2021:18:31:38 +0000","path":"/contactMeToCallSES","accountId
...........
Sun Feb 14 18:31:39 UTC 2021 : Endpoint response body before transformations: null
Sun Feb 14 18:31:39 UTC 2021 : Execution failed due to configuration error: Malformed Lambda proxy response
Sun Feb 14 18:31:39 UTC 2021 : Method completed with status: 502

Proxy integrations cannot be configured to transform responses.

In the working version the value of Endpoint request body after transformations was as below --> Integration Request Type is LAMBDA

Sun Feb 14 18:57:09 UTC 2021 : Endpoint request body after transformations: {
    "firstname": "fname",
"lastname": "lasNm",
"phone": 1231231231,
"email": "email",
"desc": "testing from connect4wree"}
Sun Feb 14 18:57:09 UTC 2021 : Sending request to https://lambda.us-w

Root Cause & Solution: The Integration Type I selected was Lambda Proxy instead of Lambda



          How to enable CORS for LAMBDA_PROXY Integration Type?

                Set the CORS header in the response created from LAMBDA as below. API Gateway is                     just a pass  through in case of LAMBDA_PROXY



Usage Plans for API Gateway - Throttle the requests


 

 


Lambda (Glue b/w services, Serverless-means you don't manage the server)

Pay for processing time.

Chat History Pull

Order history App



 

 Transaction Rate Alarm



Uses

Can trigger based on time/fixed schedule - for periodic batch runs

Lambda Triggers - Many services can generate triggers to Lambda.


 

Language Support 

 


 

S3 Event Notification settings - for any event, with destination as Lambda, SQS or SNS topic


Lambda & Data Pipeline

Lambda & Redshift (DynamodDB can store the state info, & helps to batch data in to RS)


Lambda & Kinesis (Lambda will get a batch of stream records)
In Kinesis Data streams - partition key (same shard)
Kinesis Data Limitations
Produce using SDK, KPL or  Kinesis Agent

Java SDK - PutRecord (500 rec/call), GetRecord?  

 Consmer - SDK, KCL, FireHose, Lambda 

Kinesis Analytics - Serverless

Lambdas have a special behaviour when it comes to processing Kinesis event records. When the Lambda throws an error while processing a batch of records, it automatically retries the same batch of records. No further records from the specific shard are processed.

         https://www.youtube.com/watch?v=G9nSwSd64RU - Kinesis-Lambda (java8) 
Lambda code editor will not support Java 8 - so write and upload the JAR https://maven.apache.org/plugins/maven-assembly-plugin/usage.html 

                                      

Memory, Timeout, IAM Role


Code model






 


 













 

 

No comments:

Post a Comment