Connecting Sight and Sound through Space, Time and Language