When Spark Streaming or Batch process writes to Kafka Topic
Be aware of the authentication done by org.apache.kafka.client.producer.KafkaPublisher send/doSend methods - which loads the keystore/keytab for authentication for each message it publishes
Which means it will read the keytab from the system so many times - so as the message count increases, the number of open file handles will increase extensively and can potentially cause the ULIMIT (max # of open file handles) to exceed causing too many open files error thrown and job fails
Make sure to publish the RDD - distributed way
No comments:
Post a Comment