New Clustering Method Simplifies Analysis of Large Data Sets
Researchers from HSE University and the Institute of Control Sciences of the Russian Academy of Sciences have proposed a new method of data analysis: tunnel clustering. It allows for the rapid identification of groups of similar objects and requires fewer computational resources than traditional methods. Depending on the data configuration, the algorithm can operate dozens of times faster than its counterparts. The study was published in the journal Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia.
Each year, the volume of information requiring processing continues to grow. Data comes from a variety of sources: scientific research, financial reports, medical examinations, and many others. Clustering methods—which group data based on similar characteristics—are used to detect patterns and organise information within such large datasets. These groupings are known as clusters.
One of the most widely used clustering methods is the k-means algorithm. It divides data into a predetermined number of clusters, initially selecting their centres (centroids). However, this method has a limitation: the number of clusters must be known beforehand, which is not always possible when dealing with complex data. Scientists from HSE University and the V.A. Trapeznikov Institute of Control Sciences have proposed a new approach to simplify this process—tunnel clustering. Unlike the k-means method, this algorithm does not require the number of clusters to be set in advance; it determines the necessary number itself by analysing the data structure.
‘The algorithm forms “tunnels” in the data—regions in multidimensional space where objects with similar characteristics group together,’ explained Fuad Aleskerov, Head of the Department of Mathematics at the HSE Faculty of Economic Sciences. ‘Users can choose from three modes of operation: with fixed cluster boundaries, with adaptive boundaries that adjust to the data structure, or a combined approach. This makes the method flexible and suitable for various types of tasks.’
The method was tested on a synthetic (artificially generated) dataset of 100,000 objects, as well as on real-world tasks in public administration and the banking sector.

The main advantage of the new method is its speed. Unlike classical algorithms that demand significant computational resources, tunnel clustering can, depending on the data configuration, perform the analysis dozens of times faster.
In addition, the researchers introduced the concept of the ‘transition degree’—a parameter indicating how many characteristics of an object must change for it to be classified into a different cluster. This helps assess the clarity of cluster boundaries and identify objects situated at the intersection of different groups.
‘People are generating more and more data, and the pace is only accelerating. According to the latest Digital 2025: Global Overview Report, as of early 2025, there were 5.56 billion internet users—nearly 68% of the global population. Adults spend an average of 6 hours and 38 minutes online each day, communicating, working, watching videos, and consuming content,’ said Alexey Myachin, Senior Research Fellow at the HSE International Centre for Decision Choice and Analysis. ‘Companies that ignore data analysis are losing vast sums of money.’
The authors continue to refine the algorithm, including conducting research into dimensionality reduction, which will help further decrease the time required to identify patterns in data.
The study was carried out with partial support from the Russian Science Foundation.
See also:
'Our Goal Is Not to Determine Which Version Is Correct but to Explore the Variability'
The International Linguistic Convergence Laboratory at the HSE Faculty of Humanities studies the processes of convergence among languages spoken in regions with mixed, multiethnic populations. Research conducted by linguists at HSE University contributes to understanding the history of language development and explores how languages are perceived and used in multilingual environments. George Moroz, head of the laboratory, shares more details in an interview with the HSE News Service.
Slim vs Fat: Overweight Russians Earn Less
Overweight Russians tend to earn significantly less than their slimmer counterparts, with a 10% increase in body mass index (BMI) associated with a 9% decrease in wages. These are the findings made by Anastasiia Deeva, lecturer at the HSE Faculty of Economic Sciences and intern researcher in Laboratory of Economic Research in Public Sector. The article has been published in Voprosy Statistiki.
Scientists Reveal Cognitive Mechanisms Involved in Bipolar Disorder
An international team of researchers including scientists from HSE University has experimentally demonstrated that individuals with bipolar disorder tend to perceive the world as more volatile than it actually is, which often leads them to make irrational decisions. The scientists suggest that their findings could lead to the development of more accurate methods for diagnosing and treating bipolar disorder in the future. The article has been published in Translational Psychiatry.
Scientists Develop AI Tool for Designing Novel Materials
An international team of scientists, including researchers from HSE University, has developed a new generative model called the Wyckoff Transformer (WyFormer) for creating symmetrical crystal structures. The neural network will make it possible to design materials with specified properties for use in semiconductors, solar panels, medical devices, and other high-tech applications. The scientists will present their work at ICML, a leading international conference on machine learning, on July 15 in Vancouver. A preprint of the paper is available on arxiv.org, with the code and data released under an open-source license.
HSE Linguists Study How Bilinguals Use Phrases with Numerals in Russian
Researchers at HSE University analysed over 4,000 examples of Russian spoken by bilinguals for whom Russian is a second language, collected from seven regions of Russia. They found that most non-standard numeral constructions are influenced not only by the speakers’ native languages but also by how frequently these expressions occur in everyday speech. For example, common phrases like 'two hours' or 'five kilometres’ almost always match the standard literary form, while less familiar expressions—especially those involving the numerals two to four or collective forms like dvoe and troe (used for referring to people)—often differ from the norm. The study has been published in Journal of Bilingualism.
Overcoming Baby Duck Syndrome: How Repeated Use Improves Acceptance of Interface Updates
Users often prefer older versions of interfaces due to a cognitive bias known as the baby duck syndrome, where their first experience with an interface becomes the benchmark against which all future updates are judged. However, an experiment conducted by researchers from HSE University produced an encouraging result: simply re-exposing users to the updated interface reduced the bias and improved their overall perception of the new version. The study has been published in Cognitive Processing.
Mathematicians from HSE Campus in Nizhny Novgorod Prove Existence of Robust Chaos in Complex Systems
Researchers from the International Laboratory of Dynamical Systems and Applications at the HSE Campus in Nizhny Novgorod have developed a theory that enables a mathematical proof of robust chaotic dynamics in networks of interacting elements. This research opens up new possibilities for exploring complex dynamical processes in neuroscience, biology, medicine, chemistry, optics, and other fields. The study findings have been accepted for publication in Physical Review Letters, a leading international journal. The findings are available on arXiv.org.
Mathematicians from HSE University–Nizhny Novgorod Solve 57-Year-Old Problem
In 1968, American mathematician Paul Chernoff proposed a theorem that allows for the approximate calculation of operator semigroups, complex but useful mathematical constructions that describe how the states of multiparticle systems change over time. The method is based on a sequence of approximations—steps which make the result increasingly accurate. But until now it was unclear how quickly these steps lead to the result and what exactly influences this speed. This problem has been fully solved for the first time by mathematicians Oleg Galkin and Ivan Remizov from the Nizhny Novgorod campus of HSE University. Their work paves the way for more reliable calculations in various fields of science. The results were published in the Israel Journal of Mathematics (Q1).
Large Language Models No Longer Require Powerful Servers
Scientists from Yandex, HSE University, MIT, KAUST, and ISTA have made a breakthrough in optimising LLMs. Yandex Research, in collaboration with leading science and technology universities, has developed a method for rapidly compressing large language models (LLMs) without compromising quality. Now, a smartphone or laptop is enough to work with LLMs—there's no need for expensive servers or high-powered GPUs.
AI to Enable Accurate Modelling of Data Storage System Performance
Researchers at the HSE Faculty of Computer Science have developed a new approach to modelling data storage systems based on generative machine learning models. This approach makes it possible to accurately predict the key performance characteristics of such systems under various conditions. Results have been published in the IEEE Access journal.