"Research in Industrial Projects" Students Present Findings to USC Shoah Foundation Staff

Tue, 08/22/2017

Another group of talented students has completed their applied math research project as part of UCLA Institute for Applied Mathematics (IPAM)’s Research in Industrial Projects (RIPS) summer program, offering USC Shoah Foundation staff options for improving the functionality of the Visual History Archive.

This year’s group – Kira Parker, Rachel Lewis, Eric Gao, Lucia Li and advisor Ehsan Ebrahimzadeh – was tasked with researching algorithms for delivering personalized results when users enter keywords in the Visual History Archive’s “Quick Search” function to search for testimonies. The institute wanted to see if there is a way to return personalized search results based on a user’s preferences, viewing history and behavior of similar users, similar to the way Amazon and Netflix offer suggestions.

After working on the project for a month, the students demonstrated what they came up with in a presentation to USC Shoah Foundation staff.

They began by acknowledging the problem that USC Shoah Foundation faces: there are so many keywords tagging each minute of every testimony in the Visual History Archive (over 64,000 keywords, to be exact), that it is very difficult for users to sift through all their search results to find the testimony segments that are most relevant to what they’re looking for.

The group explained that there are two ways to deliver personalized results: recommend videos to a user that contain keywords that are similar to other videos the user has watched, and recommend videos to a user that contain keywords similar to videos watched by other users with similar preferences.

Both rely on a process called feature selection, which automatically groups keywords into sets of related keywords – a crucial step in establishing which videos are considered similar to other videos, even if they don’t have the exact same keywords.

They used two methods to deliver results based on a user’s search history: principal component analysis (PCA) and probabilistic topic modeling (PLSA and LDA).

PCA suggests videos that contain keywords that are similar to, though not always exactly the same as, searches a user has already performed. PLSA and LDA use a different formula but also return results based on a user’s history.

The group used a formula called non-negative matrix factorization to automatically analyze the viewing habits of a large group of users to predict their keyword preferences and use these predictions to offer suggestions to a user who has searched for similar keywords.

Overall, the group was pleased with their project outcomes and noted that their methods had indeed given relevant, personalized suggestions based on their sample users’ preferences. At their presentation, staff were impressed with the project and raised thoughtful questions about how it could be implemented into the Visual History Archive.

Like this article? Get our e-newsletter.