Measuring the robustness and generalization properties of unified vision and language models