Computer Science Seminar
PhD Soapbox
Automated Lecture Video Indexing with Text Analysis and Machine Learning
When: Friday, March 27, 2015
Where: PGH 232
Time: 11:00 AM - 1:00 PM
Speaker: Tayfun Tuna, University of Houston
Host: Prof. Jaspal Subhlok
Classroom lecture videos have been commonly used to supplement in-class teaching and for distance learning. Videos recorded during in-class teaching and made accessible online are a versatile resource on par with a textbook and the classroom itself. Nonetheless, the adoption of lecture videos has been limited, in large part due to the difficulty of quickly accessing the content of interest in a long video lecture. In this work, we present automatic video indexing by using machine learning which divides the video into meaningful segments represented by index points to facilitate easy access to video content and enhance the user experience.
Videos are composed of thousands of video frames which are repeating sequence of images. Detection of unique images, transition points, can be done by simple image analysis. But finding where the topic is changed among these transition points is quite challenging task as it requires conceptual text analysis. The precise meaning of a topic is also subjective and text-based approaches analyzing word frequencies to determine conceptual difference may not be sufficient. These approaches compare the similarity of the transition points and merges video segments based on text similarity. Due to feasibility they are limited to use a few number of features like frame duration and text similarity of consecutive frames. First time words appear in a video, title or text with bigger font size and many more features can provide information to segment a video but they are not used in these approaches. In this work we provide a new video indexing method by using Machine Learning which can use all the proposed features for video indexing.
Experiment was done on a set of twenty-five lecture videos from courses in Computer Science, Biology and Earth and Atmospheric Sciences departments. The ground truth is established by asking the lecture instructor or another topic expert to manually identify topic transitions in the video. However, determining the ground truth is not easy. Even for the experts some transition points are difficult to differentiate as a start of a topic. For this reason, every transition point in a lecture video is rated from 0 to 3: Definitely not index (0), probably not index (1), probably index (2), and definitely index (3). Text in the video frames extracted by OCR tool and feature vectors created by using this text and ground truth which resulted in a dataset of 1628 instances and 406 features. Experiments show that lecture video indexing by machine learning is promising but has limitations as the average number of index points should be provided to Machine Learning algorithms for practical use. It is found that, converting the dataset from 4–level index ratings to 2-level rating is necessary prior to processing it by Machine Learning. Selection of desired number of index points from a machine learning output is possible by only such algorithms which can provide probability distributions of classes, like ensemble models Random Forest and Bagging. Experiment results in this dataset shows that Machine Learning indexing can provide 25% improvement compared to text-based indexing approaches.