• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

AI vs AI: Scientists Develop Neural Networks to Detect Generated Text Insertions

AI vs AI: Scientists Develop Neural Networks to Detect Generated Text Insertions

© iStock

A research team, including Alexander Shirnin from HSE University, has developed two models designed to detect AI-generated insertions in scientific texts. The AIpom system integrates two types of models: a decoder and an encoder. The Papilusion system is designed to detect modifications through synonyms and summarisation by neural networks, using one type of models: encoders. In the future, these models will assist in verifying the originality and credibility of scientific publications. Articles describing the Papilusion and AIpom systems have been published in the ACL Anthology Digital Archive.

As language models like ChatGPT and GigaChat become more popular and widely used, it becomes increasingly challenging to distinguish original human-written text from AI-generated content. Artificial intelligence is already being used to write scientific publications and graduation papers. Therefore, it is crucial to develop tools capable of identifying AI-generated insertions in texts. A research team, including scientists from HSE University, presented their solutions at the SemEval 2024 and DAGPap24 international scientific competitions. 

The AIpom model was used to identify the boundaries between original and generated fragments in scientific papers. In each paper, the proportion of machine-generated text to the author's text varied. To train the models, the organisers provided texts on the same topic. However, during the verification stage, the topics changed, making the task more challenging. 

Alexander Shirnin

'Models perform well on familiar topics, but their performance declines when presented with new topics,' according to Alexander Shirnin, co-author of the paper and Research Assistant at the Laboratory for Models and Methods of Computational Pragmatics, HSE Faculty of Computer Science. 'It's like a student who, having learned how to solve one type of problem, struggles to solve a problem on an unfamiliar topic or from a different subject as easily or accurately.'

To improve the system's performance, the researchers combined two models: a decoder and an encoder. At the first stage, a neural network decoder was used, with the input consisting of an instruction and the source text, and the output being a text fragment presumably generated by AI. Next, in the original text, the area where the model predicted the beginning of a generated fragment was highlighted using a special <BREAK> token. The encoder then processed the text marked up in the first stage and refined the decoder's predictions. To do this, it categorised each token—the smallest unit of text, such as a word or part of a word—and identified whether it was written by a human or generated by AI. This approach improved accuracy compared to systems that used only one type of model: AIpom ranked second at the SemEval-2024 competition. 

The Papilusion model also distinguished between written text and generated text. Using Papilusion, sections of the text were classified into four categories: written by a human, modified with synonyms, generated, or summarised by a model. The task was to accurately identify each category. The number of categories and the length of insertions in the texts varied. 

In this case, the developers used three models, all of the same type: encoders. They were trained to predict one of the four categories for each token in the text, with each model trained independently of the others. When a model made an error, a cost was applied, and the model was retrained with the lower layers frozen. 

'Each model has a different number of layers, depending on its architecture. When training a model, we can leave the first ten or so layers unchanged and adjust only the parameters in the last two layers. This is done to prevent losing important data embedded in the first layers during training,' explains Alexander Shirnin. 'It can be compared to an athlete who makes an error in the movement of their hand. We only need to explain this part to them, rather than resetting their entire learning and retraining them, as they might forget how to move correctly overall. The same logic applies here. The method is not universal and may not work with all models, but in our case, it was effective.' 

The three encoders independently determined the category for each token (word). The system's final prediction was based on the category that received the most points. Papilusion ranked sixth out of 30 in the competition. 

According to the researchers, current AI detection models perform reasonably well but still have limitations. Primarily, they struggle to process data beyond what they were trained on, and overall, there is a lack of diverse data to train the models effectively. 

'To obtain more data, we need to focus on collecting it. Both companies and laboratories have been doing this. Specifically for this type of task, it is necessary to collect datasets that include texts modified using multiple AI models and modification methods,' the researcher comments. 'Instead of continuing a text using just one model, more realistic scenarios should be created, such as asking the model to add to the text, rewrite the beginning for better coherence, remove parts of it, or generate a portion of the text in a new style using a different prompt. Of course, it is also important to collect data in different languages and on a variety of topics.' 

See also:

Fifteen Minutes on Foot: How Post-Soviet Cities Manage Access to Essential Services

Researchers from HSE University and the Institute of Geography of the Russian Academy of Sciences analysed three major Russian cities to assess their alignment with the '15-minute city' concept—an urban design that ensures residents can easily access essential services and facilities within walking distance. Naberezhnye Chelny, where most residents live in Soviet-era microdistricts, demonstrated the highest levels of accessibility. In Krasnodar, fewer than half of residents can easily reach essential facilities on foot, and in Saratov, just over a third can. The article has been published in Regional Research of Russia.

HSE Researchers Find Counter-Strike Skins Outperform Bitcoin and Gold as Alternative Investments

Virtual knives, custom-painted machine guns, and gloves are common collectible items in videogames. A new study by scientists from HSE University suggests that digital skins from the popular video game Counter-Strike: Global Offensive (CS:GO) rank among the most profitable types of alternative investments, with average annual returns exceeding 40%. The study has been published in the Social Science Research Network (SSRN), a free-access online repository.

HSE Neurolinguists Reveal What Makes Apps Effective for Aphasia Rehabilitation

Scientists at the HSE Centre for Language and Brain have identified key factors that increase the effectiveness of mobile and computer-based applications for aphasia rehabilitation. These key factors include automated feedback, a variety of tasks within the application, extended treatment duration, and ongoing interaction between the user and the clinician. The article has been published in NeuroRehabilitation.

'Our Goal Is Not to Determine Which Version Is Correct but to Explore the Variability'

The International Linguistic Convergence Laboratory at the HSE Faculty of Humanities studies the processes of convergence among languages spoken in regions with mixed, multiethnic populations. Research conducted by linguists at HSE University contributes to understanding the history of language development and explores how languages are perceived and used in multilingual environments. George Moroz, head of the laboratory, shares more details in an interview with the HSE News Service.

Slim vs Fat: Overweight Russians Earn Less

Overweight Russians tend to earn significantly less than their slimmer counterparts, with a 10% increase in body mass index (BMI) associated with a 9% decrease in wages. These are the findings made by Anastasiia Deeva, lecturer at the HSE Faculty of Economic Sciences and intern researcher in Laboratory of Economic Research in Public Sector. The article has been published in Voprosy Statistiki.

Scientists Reveal Cognitive Mechanisms Involved in Bipolar Disorder

An international team of researchers including scientists from HSE University has experimentally demonstrated that individuals with bipolar disorder tend to perceive the world as more volatile than it actually is, which often leads them to make irrational decisions. The scientists suggest that their findings could lead to the development of more accurate methods for diagnosing and treating bipolar disorder in the future. The article has been published in Translational Psychiatry.

Scientists Develop AI Tool for Designing Novel Materials

An international team of scientists, including researchers from HSE University, has developed a new generative model called the Wyckoff Transformer (WyFormer) for creating symmetrical crystal structures. The neural network will make it possible to design materials with specified properties for use in semiconductors, solar panels, medical devices, and other high-tech applications. The scientists will present their work at ICML, a leading international conference on machine learning, on July 15 in Vancouver. A preprint of the paper is available on arxiv.org, with the code and data released under an open-source license.

‘Economic Growth Without the AI Factor Is No Longer Possible’

The International Summer Institute on AI in Education has opened in Shanghai. The event is organised by the HSE Institute of Education in partnership with East China Normal University (ECNU). More than 50 participants and key speakers from over ten countries across Asia, Europe, North and South America have gathered to discuss the use of AI technologies in education and beyond.

HSE Linguists Study How Bilinguals Use Phrases with Numerals in Russian

Researchers at HSE University analysed over 4,000 examples of Russian spoken by bilinguals for whom Russian is a second language, collected from seven regions of Russia. They found that most non-standard numeral constructions are influenced not only by the speakers’ native languages but also by how frequently these expressions occur in everyday speech. For example, common phrases like 'two hours' or 'five kilometres’ almost always match the standard literary form, while less familiar expressions—especially those involving the numerals two to four or collective forms like dvoe and troe (used for referring to people)—often differ from the norm. The study has been published in Journal of Bilingualism.

Overcoming Baby Duck Syndrome: How Repeated Use Improves Acceptance of Interface Updates

Users often prefer older versions of interfaces due to a cognitive bias known as the baby duck syndrome, where their first experience with an interface becomes the benchmark against which all future updates are judged. However, an experiment conducted by researchers from HSE University produced an encouraging result: simply re-exposing users to the updated interface reduced the bias and improved their overall perception of the new version. The study has been published in Cognitive Processing.