Imagine that you want a map of a city. No one is willing to give you the full map, but you can get a list of all the distances between important landmarks. Not a list like “castle, cathedral, distance; townhouse, cathedral, distance, …”. Just “NNW distance, SE distance, …”. Directions and distances. To make it worse, some of the distances are missing. Can you still redraw the map? This is basically the challenge we face going from recorded diffraction images to actual images of viruses, cells, and proteins.
To be more specific (and mathematical), a diffraction pattern is the magnitude of the Fourier transform of the scattering power of the object. Only having the magnitude means that the phases are missing, and that basically means that all specific location information is missing. This problem is not unique to this field of science. However, some aspects are more unique.
Missing data
Our images are recorded on a detector, which is really not that different from what you might find in a digital camera. Like in digital cameras, most detector are CMOS or CCD chips, but tailor-made to have better noise properties and – most importantly – withstand X-ray rather than light, in a vacuum.
Still, the detectors we use are rarely perfect rectangles. Most of the X-ray beam does not diffract and continues straight through. Any electronics at all would be destroyed if exposed to that beam. Therefore, we need to have a hole in the detector or put a piece of metal (“beamstop”) ahead of it. This will correspond to an identical hole in the recorded diffraction data. We don’t have the full image any more. To make things worse, the hole is in the center. Based on the logic of the Fourier transform, points close to the center in the transform correspond to long distances in the real space of the image. In our map analogy, we are missing all distances between landmarks that are far apart.
Noisy data
All experimental images have noise of some kind. In many applications, what is called thermal noise might dominate. This refers to that individual electrons can fire and be detected just due to the heat of the detector. Noise is never a good thing, but if thermal noise is the dominant source of noise, it tends to be relatively uniform. That is, the noise is independent from the signal.
We are trying to make the most of what signal we have. This means that every single photon counts. The actual physical process distributing photons into pixels is random in itself, a quantum-process resulting in a Poisson distribution based on the theoretical ideal diffraction pattern.
Compact support
If a particle is injected in vacuum and obliterated by an X-ray pulse, there is a concentration of electrons within an overall void. This fact turns out to be highly useful for reconstruction. The shape of the object constitutes the non-zero elements of the picture describing it, the mathematical support. Knowing the approximate support turns out to be highly useful for finding the true image. Any guess for a solution which includes non-zero elements outside the support is incorrect and can be improved by pushing those parts to zero.
Iterative methods for phasing
Based on the compact support assumption, iteration schemes can be devised. Starting from an initial guess of the object and a guess of the support, the data is Fourier transformed back and forth. The recorded diffraction intensities (the “Fourier constraint”) and the support (the “real-space constraint”) are enforced. The Fourier phases are not affected, since those are the unknown variables we are trying to find. The intensities in the missing data regions are also unaffected.
The simplest method is to enforce the intensities exactly in each iteration and put the pixels outside of the support to zero as well. This is the so-called error reduction method. The ER method is simple to analyze and implement, but it tends to converge to a rather bad solution, a local minimum. Different modified iterative schemes have been created to try to explore more possible solutions, such as Hybrid input output (HIO) and Relaxed averaged alternating reflections (RAAR). We have implemented all of these efficiently for the single particle X-ray diffractive imaging setting in the library spimage and the accompanying Hawk tool.
When the support is only approximately known, the “shrinkwrap” approach is used. In this approach, a loose support is first posed. When the iteration process starts to converge, the support area can be constrained further to match the actual area matching the highest intensity.
We are actively trying to understand the exact consequences of combining different phasing algorithms, support settings, and ways to handle the missing data region as well as non-uniform noise. The latter two of these are of crucial importance, since many of these iterative schemes were formulated for use with uniform noise without missing data.
Missing modes
If the combination of the missing data region and the support are large enough, almost arbitrary functions can be fit inside. It can be helpful to determine a set of basis functions spanning the approximate null space on the non-missing region, i.e. all shapes that can be combined and still result in almost the same diffraction pattern in those parts that are not missing.
Semi-definite solvers
In general optimization research, it has long been known that linear and convex problems are relatively simple to solve. One well-known subclass of convex problems is least squares fitting, known from linear regression used in many fields of statistics. A more recent generalization is semi-definite problems.
It turns out that an approximation (relaxation) of the phase retrieval problem can be cast as a semi-definite optimization problem, as in the recently introduced PhaseLift method. We are exploring how this method can be made useful in practice for single particle imaging by adding additional constraints based on our pre-existing knowledge of the particle and the experiment. Semi-definite problems are also so close to linear and convex problems that there are extensive pre-existing theoretical and practical results that we can benefit from.
3D reconstruction
Phasing as described so far is mainly concerned with 2D images. However, biological particles are 3D objects. When particles are injected using our methods, they tumble around freely in vacuum, so the orientation of particles when they are intercepted by the pulse is essentially random.
It is possible to see two main methods for merging multiple 2D patterns into a single 3D volume. One is to first phase individual 2D patterns, then merging the 2D images into a 3D structure. The other is to try to form a coherent 3D volume in the Fourier space based on the individual patterns. Our group is open to different options, but mainly focusing on the latter. By merging data in the Fourier space, the foremost challenges in phasing are actually reduced, since the missing data region is reduced in size.
We are mainly exploring variations of the expansion maximization compression (EMC) algorithm. This algorithm is iterative. A fixed set of possible pattern orientations is used. A coherent model is achieved by assigning a probability distribution to each pattern. An averaged Fourier intensity model is created based on these distributions. Different patterns overlap in different places. Based on these overlaps, the probability distributions are updated, a new averaged Fourier model is created, etc. When the iteration process has converged, the final 3D Fourier model can be phased using the iterative methods already discussed.