The dataset HowToVQA69M is provided in the pickle file 'howtovqa.pkl'. howtovqa.pkl is a dictionary mapping each of 1,228,505 YouTube video IDs (e.g. 'nykhikt2u88') to a dictionary with 4 keys: 'start': list of start times, in seconds, of the clips from the video (e.g. 54.3) 'end': list of end times, in seconds, of the clips from the video (e.g. 61.3) 'question': list of questions (e.g. 'How do you make a triangle?') 'answer': list of answers (e.g. 'Fold them in half again') The (video clip, question, answer) triplet k (e.g. 5) of a video is given by the video delimited by the k-th start and the k-th end, the k-th question and the k-th answer. The train and val splits are provided in the pandas dataframes 'train_howtovqa.csv' and 'val_howtovqa.csv'. Both files contain 2 columns: 'video_id': YouTube video ID 'video_path': relative path to the feature file (inside the SSD_DIR/s3d_features/howto100m_s3d_features folder).