If you want to be a scientist, you’re going to have to do a ground of reading.
Science is a business focused on building and sharing knowledge. Researchers publish articles detailing their discoveries, breakthroughs and innovations in order to share these revelations with their colleagues. And there is millions of scientific articles each year.
Keeping up with the latest developments in their field is a challenge for researchers at all stages of their careers, but it particularly affects early career scientists, as they must also read the many papers that represent the foundation of their field. .
“It is impossible to read everything. Absolutely impossible,” Ajay Saputedirector of the Affective and Brain Sciences Laboratory and assistant professor of psychology at Northeastern. “And if you don’t know everything that happened on the pitch, there’s a real chance of reinventing the wheel again and again and again.” The challenge, he says, is figuring out how to economically train the next generation of scientists, balancing the need to read all the seminal papers and train them as researchers in their own right.
This task is becoming more and more difficult, says Alessia Iancarelli, a doctoral student in affective and social psychology in Satpute’s laboratory. “The volume of published literature continues to increase,” she says. “How can scientists develop their scholarship in a field given this huge amount of literature?” They have to choose what they want to read.
But common approaches to this prioritization, says Iancarelli, can incorporate biases and leave out crucial corners of the field. Iancarelli, Satpute and their colleagues therefore developed a machine learning approach to find a better, and less biased, way to build a playlist. Their results, which were published last week in the journal PLOS One, also help reduce gender bias.
“There’s really a problem with how we develop scholarship,” Saptute says. Today, scientists often use a search tool like Google Scholar on a topic and go from there, he says. “Or, if you’re lucky, you’ll have a great instructor and a great program. But it will basically be the field through that person’s eyes. And so I think it really fills a niche that could help create balance and cross-disciplinary research without necessarily having access to a great instructor, because not everyone gets that.
The problem with something like Google Scholar, Iancarelli explains, is that it will give you the most popular papers in a field, measured by how many other papers have cited them. If there are subsets of that area that aren’t as popular but are still relevant, important articles on those topics might be missed with such a search.
Take, for example, the topic of aggressiveness (which is the topic the researchers focused on when developing their algorithm). Media and video games are a particularly hot topic in aggression research, Iancarelli says, and so there are far more papers on this subset of the field than on other topics, such as the role testosterone and social aggression.
Iancarelli therefore decided to group the articles on the subject of aggression into communities. Using citation network analysis, she identified 15 research communities on aggression. Rather than looking at the raw number of times an article has been cited in another research article, the algorithm determines a community of articles that tend to cite each other or the same set of articles from base. The largest communities revealed were media and video games, stress, traits and aggression, rumination and inappropriate aggression, the role of testosterone and social aggression. But there were also a few surprises, like a small community of research papers focused on aggression and horses.
“If you use community detection, you get a very rich and granular insight into the field of aggression,” says Satpute. “You kind of have a bird’s-eye view of the whole field rather than [it appearing that] the realm of aggression is essentially the media, video games and violence.
In addition to diversifying the the subjects Using this community-based approach, the researchers also found that the percentage of articles with female first authors rated as influential by the algorithm doubled compared to when they focused on total citation counts alone. (Iancarelli adds that there could be biases in this result, as the team could not directly ask the authors about their gender identity and instead had to rely on assumptions based on the name, image, and names. author’s pronouns used to refer to it. .)
The team has released the code behind this algorithm so that others can use it and replicate their citation network analysis approach in other areas of research.
For Iancarelli, there is another motivation: “I would love to use this work to create a syllabus and teach my own course on human aggression. I would really like to base the program on the most relevant articles from each different community to give a real overview of the field of human aggression.
For media inquiriesplease contact Shannon Nargi at [email protected] or 617-373-5718.