Discourse-guided text-generation from knowledge graphs and image scene graphs