HSE University Researchers Propose Algorithm to Determine Preferences of Smartphone Users
Mathematicians from HSE University in Nizhny Novgorod have developed a new way to predict the preferences of mobile device users. The method, which is 2–12% more accurate than known analogues, is based on simultaneous recognition of objects, faces and scenes in a smartphone’s photo gallery and on a remote server. The algorithm can be used to personalise services and offer recommendations tailored to a particular person. The article was published in the Pattern Recognition journal.
Recommendation systems are based on algorithms that model user behaviour based on the information specified in a person's profile. Traditional recommendation systems use only structured and textual data. Researchers from HSE in Nizhny Novgorod and the St Petersburg Branch of the Steklov Mathematical Institute of the Russian Academy of Sciences have developed a model that uses photographs for such tasks.
Andrey Savchenko, HSE University Professor, one of the article’s authors
‘Every person’s mobile device stores a large number of photographs, which can be used to determine their hobbies, as well as preferences in food, clothes, cars. The use of modern photo-recognition methods in the smartphone gallery allows you to solve the "cold start" problem that affects new users. In other words, if a person has not made any purchases or watched recommended films, the system does not know anything about them and cannot suggest anything.’
However, as the researchers note, photo processing requires protecting users' privacy. Most photos contain personal data, and the user can disable the processing of this data on a remote server. Therefore, analytical systems must be installed on the device itself. And this is a technically difficult task to implement, since it takes a lot of time and energy to process a single image with the extremely deep convolutional neural networks (CNN) used in such processing.
The authors proposed a new method capable of quickly finding objects, faces and certain scenes and recognising events in photographs with high accuracy by simultaneously analysing visual signs and classifying found objects using small neural networks specially designed for mobile devices. It takes 30–100 ms to process one photo.
The object detector is responsible for recognising objects and faces, and the second neural network classifier is responsible for determining scenes. The study used the PEC (Photo Event Collection) and WIDER (Web Image Dataset for Event Recognition) data sets. PEC contains 14 classes of scenes (birthdays, weddings, holidays, etc.), while WIDER features 61 classes (meetings, dances, press conferences, etc.).
Scene detection allows you to extract information about a user's preferences, such as art and theatres, nightlife, and sports. And the object detector can recognise food, musical instruments, vehicles, etc., as well as analyse demographics (age, family) by people's faces and determine social status. All the faces found in photos undergo clustering: the algorithm groups the face of each person (the user’s selfies, their relatives and friends) into separate clusters. Then all the photos with faces are marked as private (containing personal information about the users and their friends), and other photos (including those without faces) are marked as potentially public.
This ensures personal data protection; all private photos and videos are processed only on the phone in offline mode. Other photos can be sent to a remote server for scene classification and object detection using high-accuracy computationally complex neural networks.
‘Due to the fact that we identified private as well as public photos that are processed on a remote server, we got a 2–4% more accurate result than by using only neural networks for mobile devices, and only a 0.5% less accurate result than when processing all photos using complex server models,’ explains Andrey Savchenko.
The proposed solution was implemented in a mobile application for the Android operating system. Experimental results show the possibility of efficient image processing with an improvement in accuracy by 2–12% compared to analogues due to the fact that scenes and objects are processed simultaneously.
The user’s digital profile is stored as a histogram of interests that can be used by recommendation systems. For example, scientists have already developed a recommendation system for restaurants. Based on the location and information about food preferences, the system offers the top 10 restaurants that match the user's profile and have the maximum average rating.