Keep Latest Data on top by using REVERSE TIMESTAMP in ROWKEY
reverseTs = <Long.MAX_VALUE - ts.getMillis()>
with this - retrieving the first N rows will return the latest N rows
Deletion - Mark for Deletion, Compaction
BigTable Self-Improvesby learning Access patterns (Rebalancing/Pointers)
Optimize performance
1) Proper Schema based on access patterns (to make use of all nodes in the cluster)
2) HDD disk (5000QPS) vs SSD 10000 QPS)
3) Throughput scales linearly with # of nodes (up to a limit)
4) Data is stored lexicographically in rowKey - Avoid hotspots, which would trigger split tablets in colossus the file system)
Replication (replica cluster can be used only for reading)
Failover can be manual or automatic (based on the app profile the application is using)
SCAN with Conditions
hbase(main):002:0> scan 'current_conditions', {'LIMIT' => 10, STARTROW => '15#S#1', ENDROW => '15#S#999', COLUMN => 'lane:speed'}ROW COLUMN+CELL 15#S#1#9223370811287575807 column=lane:speed, timestamp=1225567200, value=71.1 15#S#1#9223370811287875807 column=lane:speed, timestamp=1225566900, value=70.7 15#S#1#9223370811288175807 column=lane:speed, timestamp=1225566600, value=72.1 15#S#1#9223370811288475807 column=lane:speed, timestamp=1225566300, value=73.0 15#S#1#9223370811288775807 column=lane:speed, timestamp=1225566000, value=69.3 15#S#1#9223370811289075807 column=lane:speed, timestamp=1225565700, value=71.7 15#S#1#9223370811289375807 column=lane:speed, timestamp=1225565400, value=69.0 15#S#1#9223370811289675807 column=lane:speed, timestamp=1225565100, value=68.9 15#S#1#9223370811289975807 column=lane:speed, timestamp=1225564800, value=70.1 15#S#1#9223370811290275807 column=lane:speed, timestamp=1225564500, value=69.6 10 row(s) in 0.1960 seconds
Design RowKeys to Support Query Patterns
De-Normalize with an additional table with different keys when needed to support different query patterns
Design Key to Avoid HotSpotting >> Field Promotion (high cardinality field as rowKey prefix) and Salting ( create another attribute to the key - like, mode of time by nodes as prefix
#Column families below 100, row size below 100MB, column size below 10MB
No comments:
Post a Comment