The dataset HowToVQA69M is provided in the pickle file 'howtovqa.pkl'.

howtovqa.pkl is a dictionary mapping each of 1,228,505 YouTube video IDs (e.g. 'nykhikt2u88') to a dictionary with 4 keys: 
'start': list of start times, in seconds, of the clips from the video (e.g. 54.3)
'end': list of end times, in seconds, of the clips from the video (e.g. 61.3)
'question': list of questions (e.g. 'How do you make a triangle?')
'answer': list of answers (e.g. 'Fold them in half again')
The (video clip, question, answer) triplet k (e.g. 5) of a video is given by the video delimited by the k-th start and the k-th end, the k-th question and the k-th answer.

The train and val splits are provided in the pandas dataframes 'train_howtovqa.csv' and 'val_howtovqa.csv'. Both files contain 2 columns:
'video_id': YouTube video ID 
'video_path': relative path to the feature file (inside the SSD_DIR/s3d_features/howto100m_s3d_features folder).