"Researchers are typically interested in results, not programming. MATLAB enables us to think at a higher level of abstraction and spend less time developing, debugging, testing, and creating graphs. As a result, we get research results much faster."
Dr. Gil Alterovitz, Massachusetts Institute of Technology and Harvard University
Diagnosing cancer in its earliest stages can greatly improve a patient's chances of survival. Ovarian cancer, for example, is often identified only after it has progressed to stage three or four. For patients that are diagnosed with the disease in stage one or two, the odds of surviving in five years is increased from less than 50% to about 95%.
Researchers and students at the Massachusetts Institute of Technology (MIT) are exploring methods to diagnose cancer in earlier stages by examining blood proteins. Using MathWorks tools, these researchers are identifying concentrations of proteins and protein interactions present only in cancer patients to enable early cancer detection. Students use MathWorks tools to learn from and contribute to the research group's efforts, while gaining the knowledge and experience to drive future biomedical advances.
"In bioinformatics, research conducted two years ago is considered old. With MathWorks tools, we can engage students in leading-edge research that our group is doing today," says Dr. Gil Alterovitz, an NIH Biomedical Informatics Fellow in the MIT/Harvard Division of Health Sciences and Technology. "MathWorks tools enable the research group and the students—including biology majors and engineers—to focus on research and spend less time programming."
To better identify proteins that may signal the presence of cancer, researchers at MIT and Harvard Medical School, including Alterovitz, Marco F. Ramoni, and Isaac S. Kohane, sought to combine mass spectrometry (MS) results with knowledge of how proteins interact. MS data includes characteristic peaks and valleys that can be analyzed to distinguish molecular compounds in a sample. The researchers needed tools to process this data and to build a sophisticated model to represent protein interactions.
"We had to analyze mass spectrometry data that included millions of data points," explains Alterovitz. "We also needed to model a network of interacting biological molecules, perform statistical calculations, as well as other analysis on the properties of this network, and combine these with the mass spectrometry results."
In parallel with this research, Alterovitz also initiated and directed a new course called Bioinformatics and Proteomics: an Engineering Problem-Solving Based Approach. Upper-level undergraduate students as well as first- and second-year graduate students attended the class. Alterovitz wanted to standardize the course on a set of tools that enabled the students to benefit from ongoing research, yet would be easy to learn.
"Since we had schedule constraints, we did not want to waste time teaching the students a new language," Alterovitz explains. "We needed a tool that the majority of students were already familiar with, and one that could be learned easily by both biologists and engineers."
Researchers at MIT are using MathWorks tools to advance bioinformatics and proteomics. MIT students are using the same tools to gain hands-on experience in these fields.
Alterovitz and his research group used MATLAB® to develop algorithms for analyzing the MS data and to model the protein interactivity network, which consisted of more than 20,000 nodes and 100,000 edges. Each network node represented a mass associated with a protein, and each edge represented an interaction between nodes.
The researchers also used MATLAB to visualize data, plot results, and access databases shared with other biomedical researchers.
Because MS data resembles the series of peaks and valleys in sound or voice data, researchers can apply signal processing techniques to process the data. MIT researchers used the Signal Processing Toolbox to process this MS data and applied filters to eliminate noise and irrelevant data, enabling them to concentrate on a more manageable data set.
The Bioinformatics Toolbox enabled the team to quickly obtain information about proteins from a variety of Internet resources. The team used the Bioinformatics Toolbox to calculate molecular weights, obtain amino acid sequences as well as other properties of specific proteins, and to download as well as parse information into data structures accessible by MATLAB.
MIT researchers used the Statistics Toolbox to calculate network properties, including connectivity and power law distributions. They used models for calculating the number of proteins in a sample using the Statistics Toolbox to simplify curve fitting and generate negative binomial, gamma, and exponential distributions.
The group’s research involved millions of MS data points from hundreds of patients. However, because each patient’s data was independent, the task of processing the information was ideal for parallelization. Using the Parallel Computing Toolbox™ and the MATLAB Distributed Computing Server™, the group executed their MATLAB algorithms concurrently on a large cluster of computers.
The group analyzed each patient’s MS data independently on a different processor. Alterovitz explains, "In addition to significantly reducing computation time, the Parallel Computing Toolbox enabled us to program this approach quickly. Instead of learning distributed programming, we used our existing MATLAB code, and made it parallel using the Parallel Computing Toolbox."
The team also used a distributed approach to speed the calculation of network properties and statistics by dividing the network into chunks and running the tasks in parallel.
For the bioinformatics and proteomics course, Alterovitz and his fellow course instructors chose MATLAB for its ease of use, interoperability with other tools, and ability to present concepts at increasing levels of abstraction.
"About 90 percent of the class had already used MATLAB," says Alterovitz. "Everyone began using MATLAB immediately—even those with no prior experience—because you do not need to know how to program in order to use it."
In addition, MATLAB provided the students with an easy way to access and learn from leading research conducted at MIT and Harvard.
The course’s teaching approach was based on elaboration theory. It involved using a limited set of concepts and examples, and gradually adding complexity. Alterovitz explains, "MATLAB intrinsically supports different levels of complexity, through various levels of abstraction. In the beginning, students run the code and visualize results. Later, they can explore, update, and even integrate the code with other programming languages to add more detail."
The coursework also mirrored this approach across biological levels. The students first used MathWorks tools to analyze fundamental DNA sequence information. They then progressed to more complex expression data, proteins, and eventually interactions between proteins and other molecules using a network model.
To improve diagnostic techniques for cancer by identifying proteins and analyzing their interactions
Use MathWorks tools to enable students and researchers to analyze mass spectrometry data, model complex protein interactions, and visualize results