Wednesday, January 6, 2021

HBase/NoSQL/BigTable Tips


Keep Latest Data on top by using REVERSE TIMESTAMP in ROWKEY  

    reverseTs = <Long.MAX_VALUE - ts.getMillis()>    

    with this - retrieving the first N rows will return the latest N rows


Deletion - Mark for Deletion, Compaction



BigTable Self-Improvesby learning Access patterns (Rebalancing/Pointers)



Optimize performance

1) Proper Schema based on access patterns (to make use of all nodes in the cluster)
2) HDD disk (5000QPS) vs SSD 10000 QPS)
3) Throughput scales linearly with # of nodes (up to a limit)
4) Data is stored lexicographically in rowKey - Avoid hotspots, which would trigger split tablets in colossus the file system) 
Replication (replica cluster can be used only for reading)
Failover can be manual or automatic (based on the app profile the application is using)

 
SCAN with Conditions

hbase(main):002:0> scan 'current_conditions', {'LIMIT' => 10, STARTROW => '15#S#1', ENDROW => '15#S#999', COLUMN => 'lane:speed'}ROW COLUMN+CELL 15#S#1#9223370811287575807 column=lane:speed, timestamp=1225567200, value=71.1 15#S#1#9223370811287875807 column=lane:speed, timestamp=1225566900, value=70.7 15#S#1#9223370811288175807 column=lane:speed, timestamp=1225566600, value=72.1 15#S#1#9223370811288475807 column=lane:speed, timestamp=1225566300, value=73.0 15#S#1#9223370811288775807 column=lane:speed, timestamp=1225566000, value=69.3 15#S#1#9223370811289075807 column=lane:speed, timestamp=1225565700, value=71.7 15#S#1#9223370811289375807 column=lane:speed, timestamp=1225565400, value=69.0 15#S#1#9223370811289675807 column=lane:speed, timestamp=1225565100, value=68.9 15#S#1#9223370811289975807 column=lane:speed, timestamp=1225564800, value=70.1 15#S#1#9223370811290275807 column=lane:speed, timestamp=1225564500, value=69.6 10 row(s) in 0.1960 seconds

Design RowKeys to Support Query Patterns

De-Normalize with an additional table with different keys when needed to support different query patterns

Design Key to Avoid HotSpotting >> Field Promotion (high cardinality field as rowKey prefix) and Salting ( create another attribute to the key - like, mode of time by nodes as prefix

#Column families below 100, row size below 100MB, column size below 10MB




No comments:

Post a Comment