Data validation and cleansing for Big Data