This presentation, originally delivered at the 3D-Sig section during the 30th Annual Intelligent Systems for Molecular Biology (ISMB) conference held in Madison in July 2022, provides a deep dive into the vital role of protein-protein interactions and genetic factors in understanding the pathology of COVID-19.
The study centers around the analysis of interface variants and their affinity predictions, specifically focusing on their interactions with Angiotensin-Converting Enzyme 2 (ACE2), a key player in the SARS-CoV-2 infection process. This comprehensive examination offers invaluable insights into the complex relationship between viral binding, genetic variants, and their contribution to the individual risk of developing severe COVID-19.
In addition to detailing the methodology and results of the research, the presentation also illustrates the applications of bioinformatics tools and computational modeling techniques in predicting protein-protein interactions. The findings presented have significant implications in understanding individual genetic risk for COVID-19, potentially guiding targeted interventions and personalized treatment strategies.
As we continue grappling with the global COVID-19 pandemic, understanding the intricate web of interactions at the molecular level is essential. This presentation is an important step in this direction, contributing to our collective knowledge about this devastating disease.
Here are the links to the associated research papers:
- bit.ly/covid19-ace2-ploscb
- bit.ly/covid19-rbd-ace2-elife
- bit.ly/covid19-ace2-variants
Hi. My name is Stuart MacGowan and I'm a post-doc in Geoff Barton's lab at the University of Dundee in Scotland.
Over the last couple of years we’ve been looking at the effects of missense variants in human Angiotensin Converting Enyzme 2 – ACE2 – on its interaction with the SARS-CoV-2 Spike.
ACE2 was one of the first candidate genes thought to play a role in Covid-19 genetic risk because of its role as the viruses entry receptor. We wanted to characterise the influence of population missense variants in ACE2 on disease susceptibility and severity and in this talk I'll present the results of the computational and experimental methods we deployed to do this.
Besides sharing what we found out about the ACE2 variants, I’m hoping to convince you that existing computational methods can be deployed to address important problems, so long as you are careful in how you interpret the predictions and consider the effects of uncertainty in predictions on your conclusions, and that they are especially useful if you can combine predictions with experiments.
I'll start with a brief introduction highlighting the all too familiar human consequences of the Covid-19 pandemic and then I'll explain what we do in the Barton Group that meant we were ideally suited to tackle this problem.
The rest of the talk is divided into three parts:
In part one, I’ll describe the first predictions we reported in April 2020 and how I validated and interpreted those predictions.
In part two, I’ll show how a few targeted experiments shed new light of the effects of the variants and the accuracy of our first predictions.
Finally, I'll discuss what we're starting to look at now, which includes looking at the interplay between Spike mutants and ACE2 variants.
These are some figures from the W.H.O. and O.E.C.D that illustrates just two aspects of the enormous human cost of the pandemic.
Over ½ billion confirmed cases – that’s up to 7% of the human population.
A total death toll that approaches the toll of cancer over the same period.
And one of the worst economic crises since the Great Depression and the 2008 financial crash.
These are just a couple of reasons why I think its important to learn more about Covid.
So, before I get into what we did, I’ll explain some of the expertise in the group that enabled this work.
For decades, Geoff and his lab have made important advances in sequence analysis and structure prediction. Amongst the group’s most well-known work there is:
Jalview – a powerful graphical and programming interface for working with multiple sequence alignments
JPred4 – a very successful secondary structure prediction algorithm
And the Dundee Resource for Sequence Analysis and Structure Predictions – a collection of servers for structure and function prediction.
In recent years, we’ve been integrating human variation datasets with protein structure and multiple sequence alignments to see what kind of patterns we can find and look to apply the knowledge we can take from population variants to problems beyond variant pathogenicity prediction – this is my major focus in the group.
With this expertise and toolkits, it was natural for us to look for possible variant effects that could play a role in SARS-CoV-2 infections. If you remember, in 2020 before we had vaccines and variants, one of the pressing questions was why did some people who didn’t have any of the big co-morbidities get severe Covid? Human genetics was expected to play a role and one of the first candidate genes was ACE2, the SARS-CoV-2 host entry receptor.
Ok – so the left panel is what you see in Jalview when you load ACE2 from UniProt and overlay UniProt features and gnomAD variants. You can then visualize these features on a crystal structure in a linked Jalview and Chimera session.
The gnomAD variants are pink on the sequence and magenta on the structure. So, seeing the missense variants at the interface, the question for us was:
Could these variants affect binding?
Can we predict how, quantitatively?
How far can we trust the predictions?
So, I looked at some recent methods for predicting the affinities of missense variant in protein complexes. There are a couple of leaders in terms of performance. I settled on mCSM-PPI2. This had great performance metrics including in CAPRI and the webserver was fast and easy to use and could do saturation mutagenesis which was useful for another part of the study.
It was easy to get the predictions for the gnomAD variants, but because we are familiar with the caveats around ML methods, we knew for such an important problem the general performance metrics could only get us so far. Fortunately, I found some data from 2005 looking at the effects of ACE2 variants on SARS-CoV binding, and as far as I could see these weren’t in the mCSM-PPI2 training data.
You can see here that the predictions were pretty specific at identifying inhibitory mutations at a threshold of -1 kcal/mol, and below this most of the mutations had little or no effect on binding. Of course, there were some deviations, but this gave us the confidence to interpret some of these values.
However, there were no affinity enhancing variants in this small benchmark set. So to get round this, we used the same trick that the algorithm’s authors used, which was to consider the predictions for the reverse mutations as well.
Because delta delta G is a thermodynamic state function, the reverse mutation must equal the negative of the forward mutation.
As you can see, the method doesn’t do so well on this test. So this gave us the insight that predictions involving affinity enhancing variants were less reliable.
Despite this, we figured that a +1 kcal/mol threshold did give an indication that a variant enhanced the affinity and in the preprint we also looked carefully at the structural features that changed for each mutant to assess the mechanism of the affinity change and quality assessed predictions this way.
So, our initial results that used only predictions and previously published experimental results were that:
We identified 3 variants in gnomAD that we confidently expected to strongly inhibit binding and thereby provide protection to carriers
We found 1 gnomAD variant that looked like it enhanced the affinity, and although it was less clear how this would translate to a biological effect it could be a possible risk factor – we thought this was a good prediction, but we weren’t as confident
We also got the predictions for all possible ACE2 mutations at the interface and used these data to make an estimate the order of magnitude of the total burden of ACE2 affinity mutants
We hoped that these data could be useful for genetic association screens, because it could differentiate the effects of variants for burden tests, and we also thought it could be a resource for an ACE2 decoy receptor design.
I did all this in about 1 month of 18 hours days locked down in 1 bed flat with my wife Daisy and our toddler Evie. I’m enormously grateful for her support but I have learned from the experience and in hindsight I wouldn’t do that again and wouldn’t it for anyone else either.
So, we had the opportunity to collaborate with Anton van der Merwe’s lab in Oxford who were able to manufacture the ACE2 mutants and test their affinity with Spike in a high-resolution Surface Plasmon Resonance (SPR) assay.
We picked four variants to test based on our predictions – two low affinity variants and two high affinity. We got a mixed bag of results. Although both low affinity variants were confirmed – you don’t see D355N here because they couldn’t detect any binding – the two high affinity variants turned out to be way off the mark.
We selected another 6 for assay based on a mixture of prevalence (and so potential epidemiological relevance) and the predictions and structural features. This was a bit of a mixed result too. Although only two variants in this new series had the right sign, I noticed that most had the correct rank.
In fact – and bear with me for a few moments – if we discount the grossly erroneous predictions, there was a strong linear correlation between the predictions and experiments. Now you may be sounding some alarm bells in your head at what may seem like “cherry picking”, but hopefully I can justify this approach.
The first thing is these two variants look like a category error. It’s not just that they are quantitatively off the mark, but the predictor seems convinced that these should clearly increase the affinity. On the other hand, even though only 3/7 of the other variants here were predicted to have the correct sign, this looks like it could be rectified with a small correction.
But that’s not all. If we compare the predictions to the results of a deep mutagenesis binding assay from Erik Procko and co-workers, we see that the predictions seem to overpredict slightly negative affinities.
But when we apply the correction from the linear fit the corrected predictions that are ambiguous are now more equally distributed around zero, which I think better reflects the ambiguity of a small delta delta G.
Of course, this correction wouldn’t change the AUC of a ROC plot, but it shifts where you can set your discrimination threshold. In the corrected data, chi square statistics show a significant association between the sign of the re-calibrated delta delta G and the DMS screen at a 0 kcal/mol threshold, whereas this threshold isn’t discriminatory in the raw predictions.
The exact linear correction doesn’t change what variants are below -1 kcal/mol so the correction doesn’t affect our interpretation of those variants, but the +1 kcal/mol threshold now has improved sensitivity albeit at a small loss in specificity.
I realise this may seem a little nuanced, but I think that the correction is useful because it yields more intuitive thresholds for qualitative interpretations of the predictions.
Of course, the elephant in the room is that there is clearly an issue with detecting affinity enhancing variants and of false positives in predicted affinity enhancing variants.
There’s more I could say about this comparison but for now ’ll have to move on.
So, I’ve spent a lot of time telling you about the effects of ACE2 variants on the affinity, but how does that relate to the infection.
Now its clear that if there is no binding then there is no infection, at least via the ACE2 pathway. But do lower affinity variants that retain some binding lead to a proportional reduction or is there a minimum affinity below which there is no infection?
When it comes to higher affinity variants, I’ve always been cautious about assuming that these would somehow pose a risk to carriers. There are a couple of mechanisms how this could work but to my knowledge this wasn’t really established by anything other than anecdotal correlations in the literature.
Our big breakthrough here came when one reviewer pointed us at an experimental study that looked at the infectivity of ACE2 variants. This is extremely useful for interpreting the effects of these ACE2 variants on its own, but in our hands we used it to look for an association between affinity and infectivity.
Plotting infectivity vs. experimental affinity (and note this time I’ve included a calculated upper bound for D355N based on the sensitivity of the assay) I think we get an answer for each of these hypotheses.
It looks like once you drop below a certain affinity threshold, there is a proportional reduction in infectivity and when the affinity is very low, the infectivity gets very low. However, when you go above this threshold the infectivity doesn’t really change. All this is captured well with a negative exponential model that asymptotically approaches a lower bound of affinity, below which there would be no infection, and an upper bound for infectivity where increasing affinity would have no effect. These data fit this model well – the astute eye might notice I have a Spike variant in here too.
Even though high affinity variants have normal infectivity here, this doesn’t rule out a biological effect altogether. We argue that higher affinity could facilitate entry into cells with lower ACE2 surface abundance. This idea is consistent with work in other viruses and the improved fitness of later SARS-CoV-2 strains.
Ok, so to summarise this part we used our initial predictions and benchmarks to guide a series of experimental affinity assays.
The experimental results showed us that the predictor was good at classifying low affinity mutants, but high affinity predictions were unreliable. Also, an offset masked the higher affinities of two more common missense alleles. This gave us more accurate insights into the possible biological effects of these variants, but it also allowed us to re-calibrate the predictor and improve the predictions for many variants – but not those that are categorically poorly predicted.
Finally, we’ve been able to draw a line between infectivity and affinity directly and this makes our affinity predictions even more useful.
Please check out the paper at PLOS Computational Biology and if your really interested look at the 2020 preprint too. The introductory material is similar but the methods, results and discussion have a lot that isn’t in the post-experimental work.
Ok, so recently we started a new collaboration with Kenneth Baillie’s clinical research group in Edinburgh. They are going to look for an effect of ACE2 affinity variants directly in clinical cohorts, which is what I always hoped for this.
They’ve asked us to provide strain specific effects for all the ACE2 variants they observe. First, we’re running a pilot with the experimental affinities our Oxford collaborators provided for Wuhan and Alpha Spike vs. ACE2 Reference, S19P and K26R. We were hopeful they would have both those variants in their dataset, and they do. They also have another 8 for which we don’t have experimental data and they also need effects for all these variants vs. Delta and Omicron Spikes. So, this is where the predictions will come in again.
Of course, this means I need to benchmark the predictions for different Spike RBDs. The main question is whether the predictor picks up interactions between RBD mutations and ACE2 variants. It was always a possibility that this could happen, and we saw one example of non-cooperativity between the Spike S477N mutant and ACE2 S19P. So hopefully we can show that the predictor can pick that up.
By the way, S477N was a variant that emerged a couple of times in 2020/2021 but it has re-appered in Omicron…
These are the people and resources who contributed to this work. I’ll especially mention Michael Barton who played the leading role in doing all of the lab experiments. Also I can’t thank enough the developers and maintainers of mCSM-PPI2 and other related tools who made this work possible.
Finally I’d like to thank you for your attention and I’d love for you to get in touch about this, or some of our other work.
And please pick up our papers and pre-prints on this topic and here are some links to those.
Thanks you.