An interview with Mingjie Xie, CEO of Rapid Novor, conducted by James Ives
Monoclonal antibodies are used throughout life science research, please give an overview of why the sequencing of antibody proteins is important?
Primary sequences of antibody proteins are one of the important pieces of information researchers need to know at an early stage of the antibody drug discovery, research and development process.
Credit Design_Cells| Shutterstock
With the sequence information, one can re-make the exact same antibody recombinantly, or perform additional engineering such as isotype switching, subtype switching, species switching and reformatting.
How big are the differences between different antibody protein species, subtypes and formats? What are the challenges to having a unified solution?
Antibody proteins are beautiful crafts from mother nature. Each and every antibody clone is unique, and therefore they all have their own unique sequences.
The differences between antibody protein sequences from difference species, even in those conserved framework regions, could be quite significant. Sequence motifs that frequently present in one species, may not be found in another species.
This difference will have a cascade effect on mass spec experiments. For example, an experiment protocol that works well for mouse antibodies may not work as well for hamster antibodies; or a protocol may work for one subtype but not the others, as some enzymes may not work as effectively.
This is one of the main challenges we have to overcome to design a unified solution.
What techniques have been proposed to address the antibody protein sequencing problem?
Over the years, several papers have been published to address the antibody protein sequencing problem. From the manual sequencing and assembly approach published 25 years ago, to the homology database assisted sequencing algorithms that can achieve over 90% accuracy, to the self-claimed automated full-length sequencing software released in recent years. But none of them have been widely adopted in the real world.
What challenges have these methodologies faced?
There are many challenges facing scientists when sequencing antibody proteins, both expected and unexpected.
One of the expected challenges here is the overfitting problem given the small and limited training dataset available publicly. All published works in the literature have trained their algorithms with only a few proteins. This overfits the algorithm on those few proteins, but the algorithm does not work well on new proteins. This is very likely the main reason why an algorithm works well on the original publication but works terribly in third party studies.
Some other expected challenges include, for example, heterogeneity, which increases the complexity of the sample. The experiments may be suboptimal, especially when the protocols are 'borrowed' from the general proteomics experiments. Some peptides just don't fly well and therefore not generating any signals.
In reality, there are also many unexpected challenges we have learned the hard way.
There are "contaminants". BSA is a common stabilizing/blocking agent. When added to the antibody sample, it becomes a major issue for sequencing if not removed. For example, 1% BSA usually means the amount of BSA is 10 times higher than the target antibody protein. The majority of the mass spectrometry time will be spent on the BSA proteins instead of the antibody of interest.
There may be multiple chains. In about 15% of the monoclonal antibodies we sequenced, we observed the presence of additional light chains. The separation of the multiple chains is not always possible and thus increased the complexity of the sequencing analysis.
The "monoclonal" antibody may also be buried in a background of polyclonal antibodies. One example is when sequencing the monoclonal antibody from the ascites fluids. The ascites often contain polyclonal antibodies from the host animal.
Another example is the use of non-serum-free medium during cell culture, especially when cheap fetal calf serum or newborn calf serum is used. The supernatant will contain all the bovine serum proteins including bovine polyclonal antibodies.
Those background polyclonal antibodies, in both examples, will greatly interfere with the signals generated from the target antibody, especially in the CDR regions, and thus make the sequencing work very difficult if at all possible.
Please give an overview of the concept and workflow of REmAb™ sequencing technology from Rapid Novor.
When creating our REmAb™ sequencing technologies, we are very clear on our goal. We are not trying to demonstrate the ability to sequence one antibody protein, or a specific type of antibody proteins. Our goal is to create a robust and routine monoclonal antibody protein sequencing solution. The key here is robust and routine. We want to create a technology platform that can sequence any given antibody proteins, from any species, isotypes and in any formats.
With this goal in mind, we 'borrowed' the Agile methodology commonly used in software development, where uncertainties and changes are the norm. We recognized the importance of both the experiment and informatics components in the sequencing process.
This is why we are the first team focusing on the development of antibody protein sequencing technologies who has built both the in-house mass spectrometry lab and proprietary sequencing software together. In fact, this combined expertise had allowed us to rapidly iterate and improve the technologies to maintain the position of world leader in this field.
The general concept and procedure for our REmAb™ antibody protein sequencing contains four major steps.
- Enzymatically digest the antibody protein into shorter peptides.
- Generate high mass accuracy spectra data with a Thermo Orbitrap Fusion mass spectrometer.
- Sequence each shorter peptide using Novor de novo peptide sequencing engine.
- Assemble the peptide sequences back to the long protein sequence, and accurately determine Isoleucine and Leucine using our WILD™ method.
How does the REmAb sequencing technology overcome the challenges?
The short answer is through sophisticated machine learning.
One of the key factors in the successful use of machine learning is the quality and the quantity of data.
A common concept in computer science and mathematics is "garbage in, garbage out (or GIGO)". which means that the quality of the output is determined by the quality of the input. With our in-house mass spectrometry lab, we were able to quickly iterate and improve experiments and generate high quality data for all situations.
Over the past years, we have successfully sequenced over 400 antibody proteins, including all common species, formats and samples with various qualities. This is not only a great achievement for our team, but has also provided great assets to advance the technologies. We have effectively built the world's largest and growing mass spectrometry dataset for sequencing antibody proteins. With this large dataset, we avoided the overfitting problem.
Does the REmAb sequencing technology have issues with overfitting to trial data?
No. Mass spectrometry data analysis and machine learning is one of our core competences. We had invested heavily at a very early stage in building the large dataset to avoid overfitting. We also built the internal system to periodically train the algorithms with new data, thus improve our sequencing platform over time and adapt to new challenges.
What advantages does Rapid Novor have over other sequencing service providers?
There are three key advantages of our REmAb™ antibody protein sequencing services:
- high accuracy,
- high throughput,
- our ability to deal with difficult samples.
On the high accuracy aspect, with our WILD™ method, the first commercial service to accurately distinguish Isoleucine and Leucine using mass spectrometry, we can now be certain on each and every amino acids in the antibody protein. We never settle for anything less than 100%. Even 99% accuracy is NOT good enough. It means a 1% error rate and thus on average two amino acids will be wrong in the VH and VL regions.
How to sequence antibody proteins using mass spec?
We have the highest throughput in antibody protein sequencing in the world. All major steps in the sequencing workflow have been automated. We have built internal software systems so that our scientists can review and curate sequencing results in an accurate and timely fashion. Even for large batch of samples, e.g. 30 mAb proteins, we are able to deliver all the sequencing reports in weeks instead of months. Here is the kind of throughput we promise to our customers.
We have the ability to deal with difficult-to-sequence samples, particularly, samples with multiple chains or polyclonal antibody background. So long as the target antibody proteins is dominant in the sample (>80% of total protein amount), we are able to derive the full sequences accurately. We recently lowered the amount of sample required for the sequencing work from 200ug to 100ug. And our record is deriving the full heavy and light chain sequences from only 12ug of mAb protein sample.
Given the issues with other sequencing solutions, how do you know that the REmAb sequencing results are correct?
Well, there is the ultimate blind test performed by our customers in independent labs.
We derive the sequences from the original antibody protein sample. Once the customers received the sequences, they will perform downstream confirmations or validations that fit their research purposes. Very often, in one of the steps, the customers will make the recombinant antibodies with the sequences we provided and test the binding. The recombinant antibody binds exactly as the original antibody.
What does the future hold for monoclonal antibody protein sequencing and the REmAb technique?
Our REmAb antibody protein sequencing technologies have been advancing rapidly in the past a few years, both in terms of the sequencing throughput and the complexity of the samples we can handle. At the same time, the cost had dropped considerably. This trend will continue in the future. What this means to researchers is that the technology will become more and more accessible.
As we continuously improving our ability to sequence monoclonal antibody proteins, we have seen the wide application and demand of sequencing polyclonal antibodies directly from blood. That's exactly what we have been developing internally in the past year in order to make this a reality.
Where can readers find more information?
Our website, https://www.rapidnovor.com/antibody/, is a great resource for information related to antibody protein sequencing.
About Mingjie Xie
Mr. Mingjie Xie, MSc, MBA, is the co-founder and CEO of Rapid Novor Inc. He is a computer scientist by training, received his MSc degree from Western University in the field of bioinformatics. He received his MBA degree from Richard Ivey School of Business to pursue his interests in business. Prior to co-founding Rapid Novor Inc, Mingjie is the COO of a bioinformatics software company.