Issue: MessageDigest was used as a singleton for this UDF exposed function. As MessageDigest is not threadSafe, it caused duplicate hash values when a large volume of concurrent data was processed.
Solution: So changed the logic to initialize the object for each call.
So see below the old and new approaches highlighted below (obsolete code can be removed)
Further reference:
how-to-solve-non-serializable-errors-when-instantiating-objects-in-spark-udfs/
No comments:
Post a Comment