Explainable Improved Ensembling For Natural Language And Vision