Have not read the paper in much depth yet but this looks like great work, super interesting. Thanks for sharing.
Question: in the example of prediction on untrained tasks, what exactly hasn't been trained? The paper talks about video being one of the trained tasks. Did you simply retrain model without video examples and then test performance?
The model was trained on video classification, image qa and image captioning. Video captioning and video qa is not trained, yet the model shows results on those tasks.
Question: in the example of prediction on untrained tasks, what exactly hasn't been trained? The paper talks about video being one of the trained tasks. Did you simply retrain model without video examples and then test performance?