Evaluating Multimodal AI Systems: A Comparative Analysis of Large Languagel Model-Based Models for Text, Image, and Video Generation