Developer Tale

Aye Aye Captain

Alexandre Bonvin

Utrecht University

Published April 29, 2019

First, let’s clear the air about one thing: HADDOCK, a popular tool to model interactions between biomolecules, is not named after a saltwater cod.

“It’s not about the fish,” says Alexandre M. J. J. Bonvin.

Bonvin’s structural bioinformatics group is embedded in the Bijvoet Center and part of the nuclear magnetic resonance (NMR) spectroscopy group at Utrecht University, the Netherlands. Officially, HADDOCK stands for High Ambiguity Driven protein-protein Docking. But the NMR naming tradition demands certain creativity.

HADDOCK is the Bonvin group’s flagship software tool..

The inspiration for HADDOCK comes from a 1940s comic series character by the Belgian cartoonist Hergé. The rum-drinking, profanity-spewing Captain Archibald Haddock was introduced as a foil to the optimistic, positive hero of the story, a journalist named Tintin, according to Wikipedia. Haddock soon evolved to become a strong and noble character. HADDOCK is the group’s flagship software, but among their other offerings is a bioinformatics prediction tool called Whisky, which complements well the Captain, Bonvin says.

HADDOCK’s data-driven approach has its origins in a group meeting given by a desperate graduate student. The student was having difficulty collecting sufficient experimental information to solve the structure of an E2-E3 protein complex involved in ubiquitination, a process best known for marking proteins for cellular recycling.

Listening, Bonvin had an idea. “Let’s try to use this information in a different way,” he remembers suggesting. The student was having a hard time collecting the classical NMR distance information to solve the complex, “but we could monitor the binding, even if we could not solve the structure,” he says. “It appeared NMR was good at monitoring weak and transient interactions. And the binding information—the location on the surface where the partner protein binds—was something we could use.”

They introduced the software in a 2003 paper in the Journal of the American Chemical Society. Since then, Bonvin and his team have added more data types from NMR and broadened HADDOCK to include experimental data from other techniques, including small angle X-ray scattering (SAX) and cryo-electron microscopy and also bioinformatics predictions. Simply put, the program pulls together data from different sources to get the answer to a problem.

“These days, when you start looking at more complex systems, there’s not one experimental technique that gives you all you need,” Bonvin says, “so you need to integrate data from different sources with computations to give you a model.”

The Bonvin lab aims to develop and improve computational methods by making use of a variety of information sources in an integrative approach to predict, dissect and understand the interaction between molecules.

To understand how proteins interact—and how that can go wrong in disease—scientists need to know the three-dimensional atomic structures of the complexes they form. Structures are typically solved by experimental methods. But when experiments fall short, a computational method known as docking can help.

“The model itself is never the end of the road,” Bonvin tells his students. “We use models to generate new hypotheses. Then we can go back and test things in the lab and use that information to improve the model.”

Models are more important than ever, as scientists aim to understand larger and more complex systems. The latest version of HADDOCK can model a complex of up to 20 molecules, including proteins, nucleic acids and small molecules.

High quality data also remains important. There are two phases to the modeling process: Generate a lot of models, then select the best model based on certain filters, Bonvin says. When first published, HADDOCK was unique in using data to bias the search in the first phase.

Example applications of HADDOCK are to predict how antibodies bind to their targets and engineer them to bind more precisely or to understand how a point mutation that affects the binding between two proteins. It can be an early step in understanding underlying causes of disease and how to treat them. Pharmaceutical companies use the software to understand biomolecular recognition and design molecules to interfere with this process.

HADDOCK and other tools from the Bonvin group are available through web portals (http://haddock.science.uu.nl/), made more user-friendly with online tutorials. A number of those are also offered as stand alone in the SBGrid software distribution. Several times a year, the sites see a surge in people signing up, a sign that the tools are also being used in classes.

Bonvin himself uses the tools in teaching at the bachelor’s and master’s degree levels. “I can give research projects to students with little experience in computing,” he says. “The next generation of scientists has started using the tools.”

In the lab, Bonvin’s team is trying to model larger and larger systems. To manage the computational burden, that may mean simplifying the model. In fact, “sense and simplicity” might be a good way to describe his philosophy of science.

“Students often think they have to go to the most complex treatment and start at the quantum level,” Bonvin says. “At the end, a simple model should be able to explain things if possible.” And a quantum look at electrons is just not that helpful for modeling large numbers of proteins, he adds.

The group is puzzling over other problems, such as understanding the nature of molecular interactions and what defines their strength, or binding affinity. They are also working on a methodology to tackle complexes involving membrane proteins, the target of about 40 percent of drugs.

Over the years, Bonvin has headed computational user communities that distribute computations on available computers in Europe and around the world. In fact, SBGrid originally was the Bonvin group’s contact point to the U.S. open science grid, although his group doesn’t send jobs to the U.S. anymore for technical reasons. Sustained funding from the European Union supports a large structural biology community and ensures access to high throughput and high performance computing, he says.

The funding also supports professional software development practices, such as code review, continuous integration, and documenting user requests and issues. Funding from a center of excellence for computational biomolecular research, called BioExcel (bioexcel.eu), involving Bonvin’s group and 10 other research institutions in Europe, will support the next upgrade of HADDOCK.

Bonvin was born and raised in Switzerland. He earned a master’s degree in chemistry with a specialization in NMR at the University of Lausanne. He started a PhD in NMR at Utrecht University, but an available project pulled him into computations using NMR data. He learned programming on the fly, he says, “And I liked it.”

During winter and spring holidays in college, Bonvin worked as a ski instructor at Swiss resorts. He’s sampled U.S. slopes, thanks to Keystone Symposia meetings, which famously gave people the afternoons off before lectures resumed in the evening. Now, Keystone afternoons are often filled with workshops and training, infusing outdoor excursions with a tinge of guilt. Plus, the workshops and trainings are key ways to connect with experimental users of the group’s software.

Every spring, U.S. college basketball teams compete for the men’s and women’s championships. In March 2019, Bonvin was helping to manage a different kind of March Madness in Europe and around the world. About three dozen teams, including Bonvin’s group, tested their computer modeling skills in a friendly competition to see who could best predict the three-dimensional structure. The latest contest was an 8-body problem involving a surface-layer protein complex from a deadly microbe.

The competition, called CAPRI, runs whenever a structural biology group makes a target available. Typically, that’s when a paper is in revision and the protein data bank entry is on hold, Bonvin says. Importantly, there should be no published information. The data are treated confidentially by the competitors. Once the paper is published, it is cited by every participant, instantly increasing the metrics for the authors.

Bonvin invites structural biology groups to submit targets for the world’s best modelers to test their models against the most cutting edge science. Information can be found at http://www.capri-docking.org/contribute/.

True to their iconoclast name, in the March 2019 competition, the HADDOCK team took an unconventional step of incorporating a Twitter picture into their data input. That will make a good story when the structure is published—if the social media trick worked.

-Carol Cruzan Morton