Friday, August 19, 2022

KaflaProducer and SparkStreaming - things to remember - auth keystore load & file ulimit - too many open files error

 

When Spark Streaming or Batch process writes to Kafka Topic

Be aware of the authentication done by org.apache.kafka.client.producer.KafkaPublisher send/doSend methods - which loads the keystore/keytab for authentication for each message it publishes

Which means it will read the keytab from the system so many times - so as the message count increases, the number of open file handles will increase extensively and can potentially cause the ULIMIT (max # of open file handles) to exceed causing too many open files error thrown and job fails

Make sure to publish the RDD - distributed way

No comments:

Post a Comment