Developer Tale

Mathematically Minded

James Holton

University of California, Berkeley

Published April 28, 2017

James Holton first got interested in structural biology in the late 1980s at a biochemistry summer camp as a high school sophomore. He was flipping through the textbooks and ran into the protein folding problem. “Anyone with a mathematical mind will see this problem and think there’s got to be an easier solution,” says Holton, the Beamline Scientist at the Advanced Light Source at the Lawrence Berkeley National Laboratory.

Holton followed this interest through college and graduate school. He studied biology, but his mathematical bent led him to a career in structural biology writing software, such as Elves, a first in automation of solving structures, and developing tools such as MLFSOM, an X-ray diffraction simulator, to investigate sources of error during X-ray data collection.

Though Holton now runs a beamline, it is biology that holds his interest. “It’s one of the last true sciences. You’re faced with a system you truly do not understand and have to come up with experiments clever enough to tell you something,” he says. “Some of the most beautiful experiments in science are in biology.”

As an undergraduate at Caltech, Holton worked in the lab of Steve Mayo, an expert in protein design. He also learned to code. “I learned that you only spend about a sixth of your time writing code,” he says. “The rest is maintenance.”

He chose the University of California, Berkeley, for graduate school because of its breadth in structural biology. In the lab of Tom Alber, he began solving structures and learning how to use structural biology software tools, such as the CCP4 suite.

Holton noticed that he was linking the output of one program to the input of another repeatedly. “I thought, the numbers are all in the computer, so why can’t the computer just figure it out?” he says.

Guided by the example scripts in CCP4, he created a collection of little scripts and connected them together with one big one and named the tool Elves. The idea was that, as in the fairy tale, you’d run Elves and go to bed. “They’d run overnight, and in the morning you’d have a structure,” he says.

The tool was simple and portable enough for him to bring into a synchrotron and use on the spot. He also developed a natural language interface to interpret user input rather than defining a complex set of flags and settings for users to remember.

Partially motivated by a desire to test Elves, Holton took a position running the beamline at the Berkeley Lab. “Where better to get lots and lots of real world data than sitting in the place where it’s collected,” says Holton, who has accumulated and archived 100TB of data that he uses for testing.

The work with Elves at the beamline led him to a startling revelation: even if Elves does everything right, it still fails 96 percent of the time. That is, the ratio of collected data sets to published structures is about 25 to 1. “It’s not that Elves couldn’t solve it, it’s that the data weren’t solvable, even if you cheat using the solved structure 10 years later,” he says.

So Holton investigated. To do so he developed MLFSOM, which is the reverse of MOSFLM a popular image-processing package. That is, Elves takes image files and gives you a structure, but MLFSOM takes a structure and creates diffraction image files from it. To write the program, Holton had to go back 100 years in the literature to find the right equations to use.

Working with MLFSOM has taught Holton a lot about data quality. He’s found that while there are random errors in X-ray data, which average out with repetition, there are also systematic errors in the data. Some of these are due to radiation damage. Some come from detector calibration. Others, such as the high R-factors of macromolecular structures, he’s found cannot arise from experimental error. “To date, they remain mysterious,” he says.

Holton’s work understanding the sources of error and the statistics required to appropriately manage errors have helped him guide users at the beamline. “They ask me if the data is good enough, what data they should collect, and what exposure times they should use,” he says. “These are all very good questions.”

--Elizabeth Dougherty