Bioinformatics Toolbox

Calling Bioperl Functions from MATLAB®

This example shows the interoperability between MATLAB® and Bioperl - passing arguments from MATLAB to Perl scripts and pulling BLAST search data back to MATLAB.

NOTE: Perl and the Bioperl modules must be installed to run the Perl scripts in this example. Since version 1.4, Bioperl modules have a warnings.pm dependency requiring at least version 5.6 of Perl. If you have difficulty running the Perl scripts, make sure your PERL5LIB environment variable includes the path to your Bioperl installation or try running from the Bioperl installation directory. See the links at http://www.perl.com and http://bioperl.org/ for current release files and complete installation instructions.

Introduction

Gleevec™ (STI571 or imatinib mesylate) was the first approved drug to specifically turn off the signal of a known cancer-causing protein. Initially approved to treat chronic myelogenous leukemia (CML), it is also effective for treatment of gastrointestinal stromal tumors (GIST).

If you have access to the Internet, run this command to learn more: web('http://www.cancer.gov/clinicaltrials/digestpage/gleevec')

Research has identified several gene targets for Gleevec including: Proto-oncogene tyrosine-protein kinase ABL1 (NP_009297), Proto-oncogene tyrosine-protein kinase Kit (NP_000213), and Platelet-derived growth factor receptor alpha precursor (NP_006197).

target_ABL1 = 'NP_009297';
target_Kit = 'NP_000213';
target_PDGFRA = 'NP_006197';

Accessing Sequence Information

You can load the sequence information for these proteins from local GenPept text files using genpeptread.

ABL1_seq = getfield(genpeptread('ABL1_gp.txt'), 'Sequence');
Kit_seq = getfield(genpeptread('Kit_gp.txt'), 'Sequence');
PDGFRA_seq = getfield(genpeptread('PDGFRA_gp.txt'), 'Sequence');

Alternatively, you can obtain protein information directly from the online GenPept database maintained by the National Center for Biotechnology Information (NCBI).

Run these commands to download data from NCBI:

% ABL1_seq = getgenpept(target_ABL1, 'SequenceOnly', true);
% Kit_seq = getgenpept(target_Kit, 'SequenceOnly', true);
% PDGFRA_seq = getgenpept(target_PDGFRA, 'SequenceOnly', true);

The MATLAB whos command gives information about the size of these sequences.

whos ABL1_seq
whos Kit_seq
whos PDGFRA_seq
  Name          Size              Bytes  Class    Attributes

  ABL1_seq      1x1149             2298  char               

  Name         Size             Bytes  Class    Attributes

  Kit_seq      1x976             1952  char               

  Name            Size              Bytes  Class    Attributes

  PDGFRA_seq      1x1089             2178  char               

Calling Perl Programs from MATLAB

From MATLAB, you can harness existing Bioperl modules to run a BLAST search on these sequences. MW_BLAST.pl is a Perl program based on the RemoteBlast Bioperl module. It reads sequences from FASTA files, so start by creating a FASTA file for each sequence.

fastawrite('ABL1.fa', 'ABL1 Proto-oncogene tyrosine-protein kinase (NP_009297)', ABL1_seq);
fastawrite('Kit.fa', 'Kit Proto-oncogene tyrosine-protein kinase (NP_000213)', Kit_seq);
fastawrite('PDGFRA.fa', 'PDGFRA alpha precursor (NP_006197)', PDGFRA_seq);
Warning: ABL1.fa already exists. The data will be appended to the file. 
Warning: Kit.fa already exists. The data will be appended to the file. 
Warning: PDGFRA.fa already exists. The data will be appended to the file. 

BLAST searches can take a long time to return results, and the Perl program MW_BLAST includes a repeating sleep state to await the report. Sample results have been included with this example, but if you have an Internet connection and want to try running the BLAST search with the three sequences, uncomment the following commands. MW_BLAST.pl will save the BLAST results in three files on your disk, ABL1.out, Kit.out and PDGFRA.out. The process can take 15 minutes or more.

% try
%     perl('MW_BLAST.pl','blastp','pdb','1e-10','ABL1.fa','Kit.fa','PDGFRA.fa');
% catch
%     error(message('bioinfo:bioperldemo:PerlError'))
% end

Here is the Perl code for MW_BLAST:

type MW_BLAST.pl
#!/usr/bin/perl -w
use Bio::Tools::Run::RemoteBlast;
use strict;
use 5.006;

# A sample Blast program  based on the RemoteBlast.pm Bioperl module. Takes
# parameters for the BLAST search program, the database, and the expectation
# or E-value (defaults: blastp, pdb, 1e-10), followed by a list of FASTA files
# containing sequences to search.

# Copyright 2003-2012 The MathWorks, Inc.


# Retrieve arguments and set parameters
my $prog = shift @ARGV;
my $db   = shift @ARGV;
my $e_val= shift @ARGV;

my @params = ('-prog' => $prog,
	      '-data' => $db,
	      '-expect' => $e_val,
	      '-readmethod' => 'SearchIO' );

# Create a remote BLAST factory	          
my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

# Change a parameter in RemoteBlast
$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens [ORGN]';

# Remove a parameter from RemoteBlast
delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'};

# Submit each file
while ( defined($ARGV[0])) {
    my $fa_file = shift @ARGV;
    my $str = Bio::SeqIO->new(-file=>$fa_file, '-format' => 'fasta' );    
    my $r = $factory->submit_blast($fa_file);

    # Wait for the reply and save the output file
    while ( my @rids = $factory->each_rid ) {
	foreach my $rid ( @rids ) {
	    my $rc = $factory->retrieve_blast($rid);
	    if( !ref($rc) ) {
    		if( $rc < 0 ) {
        	    $factory->remove_rid($rid);
            }
            sleep 5;
	    } else {
            my $result = $rc->next_result();
            my $filename = $result->query_name()."\.out";
            $factory->save_output($filename);
            $factory->remove_rid($rid);
            }
        }
    }
}

The next step is to parse the output reports and find scores >= 100. You can then identify hits found by more than one protein for further research, possibly identifying new targets for drug therapy.

try
    protein_list = perl('MW_parse.pl', 'ABL1.out', 'Kit.out', 'PDGFRA.out')
catch
    error(message('bioinfo:bioperldemo:PerlError'))
end
protein_list =


--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1H01|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1OIR|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1GII|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1CSY|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1F3M|C)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1A81|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1H1W|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1B6C|B)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1IG1|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1JKK|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1JOW|B)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1BI8|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1O6K|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1GZK|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1GZN|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1O6L|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1BHF|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1LCJ|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1PME|)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1CM8|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1A1A|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|3HCK|)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1AOT|F)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1PMQ|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1LKK|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1JNK|)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1SHD|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1LKL|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1BM2|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1BMB|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1CWD|L)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1BHH|B)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1IA8|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1FBZ|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No HSPs for this minimal Hit (pdb|1IJR|A)
If using NCBI BLAST, check bits() instead
---------------------------------------------------

ABL1.out
1OPL, 2584, 0.0, Chain A, Structural Basis For The Auto-Inhibition Of C-Abl...
1FMK, 923, 1e-100, Crystal Structure Of Human Tyrosine-Protein Kinase C-Src p...
1QCF, 919, 1e-100, Chain A, Crystal Structure Of Hck In Complex With A Src Fa...
1KSW, 916, 1e-100, Chain A, Structure Of Human C-Src Tyrosine Kinase (Thr338g...
1AD5, 883, 6e-96, Chain A, Src Family Kinase Hck-Amp-Pnp Complex pdb|1AD5|B ...
2ABL, 866, 5e-94, Sh3-Sh2 Domain Fragment Of Human Bcr-Abl Tyrosine Kinase
3LCK, 666, 9e-71, The Kinase Domain Of Human Lymphocyte Kinase (Lck), Activa...
1QPE, 666, 9e-71, Chain A, Structural Analysis Of The Lymphocyte-Specific Ki...
1QPD, 656, 1e-69, Chain A, Structural Analysis Of The Lymphocyte-Specific Ki...
1K2P, 620, 2e-65, Chain A, Crystal Structure Of Bruton's Tyrosine Kinase Dom...
1BYG, 592, 3e-62, Chain A, Kinase Domain Of Human C-Terminal Src Kinase (Csk...
1M7N, 561, 1e-58, Chain A, Crystal Structure Of Unactivated Apo Insulin-Like...
1JQH, 560, 2e-58, Chain A, Igf-1 Receptor Kinase Domain pdb|1JQH|B Chain B, ...
1P4O, 560, 2e-58, Chain A, Structure Of Apo Unactivated Igf-1r Kinase Domain...
1K3A, 553, 1e-57, Chain A, Structure Of The Insulin-Like Growth Factor 1 Rec...
1GJO, 550, 2e-57, Chain A, The Fgfr2 Tyrosine Kinase Domain
1FVR, 540, 3e-56, Chain A, Tie2 Kinase Domain pdb|1FVR|B Chain B, Tie2 Kinas...
1AB2, 528, 9e-55, Proto-Oncogene Tyrosine Kinase (E.C.2.7.1.112) (Src Homolo...
1IRK, 525, 2e-54, Insulin Receptor (Tyrosine Kinase Domain) Mutant With Cys ...
1I44, 523, 3e-54, Chain A, Crystallographic Studies Of An Activation Loop Mu...
1IR3, 522, 4e-54, Chain A, Phosphorylated Insulin Receptor Tyrosine Kinase I...
1FGK, 522, 4e-54, Chain A, Crystal Structure Of The Tyrosine Kinase Domain O...
1P14, 521, 6e-54, Chain A, Crystal Structure Of A Catalytic-Loop Mutant Of T...
1M14, 496, 4e-51, Chain A, Tyrosine Kinase Domain From Epidermal Growth Fact...
1PKG, 496, 4e-51, Chain A, Structure Of A C-Kit Kinase Product Complex pdb|1...
1VR2, 463, 3e-47, Chain A, Human Vascular Endothelial Growth Factor Receptor...
1JU5, 330, 8e-32, Chain C, Ternary Complex Of An Crk Sh2 Domain, Crk-Derived...
1BBZ, 317, 3e-30, Chain A, Crystal Structure Of The Abl-Sh3 Domain Complexed...
1AWO, 303, 1e-28, The Solution Nmr Structure Of Abl Sh3 And Its Relationship...
1BBZ, 303, 1e-28, Chain E, Crystal Structure Of The Abl-Sh3 Domain Complexed...
1G83, 287, 8e-27, Chain A, Crystal Structure Of Fyn Sh3-Sh2 pdb|1G83|B Chain...
1LCK, 270, 7e-25, Chain A, Sh3-Sh2 Domain Fragment Of Human P56-Lck Tyrosine...
1MUO, 233, 1e-20, Chain A, Crystal Structure Of Aurora-2, An Oncogenic Serin...
1GRI, 232, 2e-20, Chain A, Grb2 pdb|1GRI|B Chain B, Grb2
1A9U, 220, 4e-19, The Complex Structure Of The Map Kinase P38SB203580 pdb|1B...
1BMK, 213, 3e-18, Chain A, The Complex Structure Of The Map Kinase P38SB2186...
1IAN, 209, 8e-18, Human P38 Map Kinase Inhibitor Complex
1GZ8, 208, 1e-17, Chain A, Human Cyclin Dependent Kinase 2 Complexed With Th...
1OVE, 208, 1e-17, Chain A, The Structure Of P38 Alpha In Complex With A Dihy...
1OIT, 207, 1e-17, Chain A, Imidazopyridines: A Potent And Selective Class Of...
1B38, 206, 2e-17, Chain A, Human Cyclin-Dependent Kinase 2 pdb|1B39|A Chain ...
1OGU, 206, 2e-17, Chain A, Structure Of Human Thr160-Phospho Cdk2CYCLIN A CO...
1E9H, 206, 2e-17, Chain A, Thr 160 Phosphorylated Cdk2 - Human Cyclin A3 Com...
1JST, 206, 2e-17, Chain A, Phosphorylated Cyclin-Dependent Kinase-2 Bound To...
1WFC, 206, 2e-17, Structure Of Apo, Unphosphorylated, P38 Mitogen Activated ...
1QMZ, 206, 2e-17, Chain A, Phosphorylated Cdk2-Cyclyin A-Substrate Peptide C...
1DI8, 206, 2e-17, Chain A, The Structure Of Cyclin-Dependent Kinase 2 (Cdk2)...
1H1P, 206, 2e-17, Chain A, Structure Of Human Thr160-Phospho Cdk2CYCLIN A CO...
1DI9, 205, 2e-17, Chain A, The Structure Of P38 Mitogen-Activated Protein Ki...
1H4L, 202, 5e-17, Chain A, Structure And Regulation Of The Cdk5-P25(Nck5a) C...

Kit.out
1PKG, 974, 1e-106, Chain A, Structure Of A C-Kit Kinase Product Complex pdb|1...
1VR2, 805, 6e-87, Chain A, Human Vascular Endothelial Growth Factor Receptor...
1GJO, 730, 3e-78, Chain A, The Fgfr2 Tyrosine Kinase Domain
1FGK, 700, 8e-75, Chain A, Crystal Structure Of The Tyrosine Kinase Domain O...
1OPL, 410, 4e-41, Chain A, Structural Basis For The Auto-Inhibition Of C-Abl...
1FVR, 405, 1e-40, Chain A, Tie2 Kinase Domain pdb|1FVR|B Chain B, Tie2 Kinas...
1M7N, 383, 5e-38, Chain A, Crystal Structure Of Unactivated Apo Insulin-Like...
1P4O, 383, 5e-38, Chain A, Structure Of Apo Unactivated Igf-1r Kinase Domain...
1JQH, 381, 8e-38, Chain A, Igf-1 Receptor Kinase Domain pdb|1JQH|B Chain B, ...
1QCF, 377, 2e-37, Chain A, Crystal Structure Of Hck In Complex With A Src Fa...
1K3A, 371, 1e-36, Chain A, Structure Of The Insulin-Like Growth Factor 1 Rec...
1I44, 368, 3e-36, Chain A, Crystallographic Studies Of An Activation Loop Mu...
1IRK, 367, 3e-36, Insulin Receptor (Tyrosine Kinase Domain) Mutant With Cys ...
1P14, 361, 2e-35, Chain A, Crystal Structure Of A Catalytic-Loop Mutant Of T...
1IR3, 361, 2e-35, Chain A, Phosphorylated Insulin Receptor Tyrosine Kinase I...
3LCK, 354, 1e-34, The Kinase Domain Of Human Lymphocyte Kinase (Lck), Activa...
1QPE, 354, 1e-34, Chain A, Structural Analysis Of The Lymphocyte-Specific Ki...
1QPD, 354, 1e-34, Chain A, Structural Analysis Of The Lymphocyte-Specific Ki...
1AD5, 348, 6e-34, Chain A, Src Family Kinase Hck-Amp-Pnp Complex pdb|1AD5|B ...
1KSW, 344, 2e-33, Chain A, Structure Of Human C-Src Tyrosine Kinase (Thr338g...
1FMK, 344, 2e-33, Crystal Structure Of Human Tyrosine-Protein Kinase C-Src p...
1BYG, 342, 3e-33, Chain A, Kinase Domain Of Human C-Terminal Src Kinase (Csk...
1M14, 335, 2e-32, Chain A, Tyrosine Kinase Domain From Epidermal Growth Fact...
1K2P, 294, 1e-27, Chain A, Crystal Structure Of Bruton's Tyrosine Kinase Dom...
1H4L, 167, 5e-13, Chain A, Structure And Regulation Of The Cdk5-P25(Nck5a) C...
1PME, 158, 6e-12, Structure Of Penta Mutant Human Erk2 Map Kinase Complexed ...
1F3M, 156, 1e-11, Chain C, Crystal Structure Of Human SerineTHREONINE KINASE...

PDGFRA.out
1PKG, 625, 5e-66, Chain A, Structure Of A C-Kit Kinase Product Complex pdb|1...
1VR2, 550, 2e-57, Chain A, Human Vascular Endothelial Growth Factor Receptor...
1FGI, 500, 1e-51, Chain A, Crystal Structure Of The Tyrosine Kinase Domain O...
1GJO, 492, 1e-50, Chain A, The Fgfr2 Tyrosine Kinase Domain
1FVR, 419, 4e-42, Chain A, Tie2 Kinase Domain pdb|1FVR|B Chain B, Tie2 Kinas...
1QCF, 380, 1e-37, Chain A, Crystal Structure Of Hck In Complex With A Src Fa...
1QPE, 364, 9e-36, Chain A, Structural Analysis Of The Lymphocyte-Specific Ki...
1QPD, 364, 9e-36, Chain A, Structural Analysis Of The Lymphocyte-Specific Ki...
3LCK, 360, 2e-35, The Kinase Domain Of Human Lymphocyte Kinase (Lck), Activa...
1OPL, 358, 4e-35, Chain A, Structural Basis For The Auto-Inhibition Of C-Abl...
1FMK, 354, 1e-34, Crystal Structure Of Human Tyrosine-Protein Kinase C-Src p...
1KSW, 353, 2e-34, Chain A, Structure Of Human C-Src Tyrosine Kinase (Thr338g...
1AD5, 353, 2e-34, Chain A, Src Family Kinase Hck-Amp-Pnp Complex pdb|1AD5|B ...
1BYG, 352, 2e-34, Chain A, Kinase Domain Of Human C-Terminal Src Kinase (Csk...
1I44, 351, 3e-34, Chain A, Crystallographic Studies Of An Activation Loop Mu...
1IRK, 350, 4e-34, Insulin Receptor (Tyrosine Kinase Domain) Mutant With Cys ...
1M7N, 349, 5e-34, Chain A, Crystal Structure Of Unactivated Apo Insulin-Like...
1JQH, 349, 5e-34, Chain A, Igf-1 Receptor Kinase Domain pdb|1JQH|B Chain B, ...
1P4O, 349, 5e-34, Chain A, Structure Of Apo Unactivated Igf-1r Kinase Domain...
1P14, 344, 2e-33, Chain A, Crystal Structure Of A Catalytic-Loop Mutant Of T...
1IR3, 343, 2e-33, Chain A, Phosphorylated Insulin Receptor Tyrosine Kinase I...
1K3A, 338, 9e-33, Chain A, Structure Of The Insulin-Like Growth Factor 1 Rec...
1M14, 332, 4e-32, Chain A, Tyrosine Kinase Domain From Epidermal Growth Fact...
1K2P, 315, 4e-30, Chain A, Crystal Structure Of Bruton's Tyrosine Kinase Dom...
1PME, 167, 6e-13, Structure Of Penta Mutant Human Erk2 Map Kinase Complexed ...
1JOW, 155, 1e-11, Chain B, Crystal Structure Of A Complex Of Human Cdk6 And ...
1BI8, 155, 1e-11, Chain A, Mechanism Of G1 Cyclin Dependent Kinase Inhibitio...
1F3M, 150, 6e-11, Chain C, Crystal Structure Of Human SerineTHREONINE KINASE...


This is the code for MW_parse:

type MW_parse.pl
#!/usr/bin/perl
use Bio::SearchIO;
use strict;
use 5.006;

# A sample BLAST parsing program based on the SearchIO.pm Bioperl module. Takes
# a list of BLAST report files and prints a list of the top hits from each
# report based on an arbitrary minimum score.

# Copyright 2003-2012 The MathWorks, Inc.


# Set a cutoff value for the raw score.
my $min_score = 100;

# Take each report name and print information about the top hits.
my $seq_count = 0;
while ( defined($ARGV[0])) {
    my $breport = shift @ARGV;
    print "\n$breport\n";
    my $in = new Bio::SearchIO(-format => 'blast', 
                               -file   => $breport);
    my $num_hit = 0;
    my $short_desc;
    while ( my $result = $in->next_result) {
	while ( my $curr_hit = $result->next_hit ) {
	    if ( $curr_hit->raw_score >= $min_score ) {
		if (length($curr_hit->description) >= 60) {
		    $short_desc = substr($curr_hit->description, 0, 58)."...";
		} else {
		    $short_desc = $curr_hit->description;
		}
		print $curr_hit->accession, ", ",
		      $curr_hit->raw_score, ", ",
		      $curr_hit->significance, ", ",
		      $short_desc, "\n";
	    }
	    $num_hit++;
	}
    }
    $seq_count++;
}

Calling MATLAB Functions within Perl Programs

If you are running on Windows®, it is also possible to call MATLAB functions from Perl. You can launch MATLAB in an Automation Server mode by using the /Automation switch in the MATLAB startup command (e.g. D:\applications\matlab7x\bin\matlab.exe /Automation).

Here's a script to illustrate the process of launching an automation server, calling MATLAB functions and passing variables between Perl and MATLAB.

type MATLAB_from_Perl.pl
#!/usr/bin/perl -w
use Win32::OLE;
use Win32::OLE::Variant;

# Simple perl script to execute commands in Matlab.
# Note the name Win32::OLE is misleading and this actually uses COM!
#


# Use existing instance if Matlab is already running.
eval {$matlabApp = Win32::OLE->GetActiveObject('Matlab.Application')};
die "Matlab not installed" if $@;
unless (defined $matlabApp) {
   $matlabApp = Win32::OLE->new('Matlab.Application')
      or die "Oops, cannot start Matlab";
}

# Examples of executing MATLAB commands - these functions execute in
# MATLAB and return the status.

@exe_commands = ("IRK = pdbread('pdb1irk.ent');",
             "LCK = pdbread('pdb3lck.ent');",
             "seqdisp(IRK)",
             "seqdisp(LCK)",
             "[Score, Alignment] = swalign(IRK, LCK,'showscore',1);");

# send the commands to Matlab
foreach $exe_command (@exe_commands)
{  $status = &send_to_matlab('Execute', $exe_command);
   print "Matlab status = ", $status, "\n";
}

sub send_to_matlab
{  my ($call, @command) = @_;
   my $status = 0;
   print "\n>> $call( @command )\n";
   $result = $matlabApp->Invoke($call, @command);
   if (defined($result))
   {   unless ($result =~ s/^.\?{3}/Error:/)
       {  print "$result\n" unless ($result eq "");
       }
       else
       {  print "$result\n";
          $status = -1;
       }
   }
   return $status;
}

# Examples of passing variables between MATLAB and Perl.
#
# MATLAB supoprts passing character arrays directly with the following syntax:
#
# PutCharArray([in] BSTR name, [in] BSTR workspace, [in] BSTR string);
# GetCharArray([in] BSTR name, [in] BSTR workspace, [out] BSTR string);

&send_to_matlab('PutCharArray', 'centralDogma', 'base', 'DNA->RNA->Protein.');
&send_to_matlab('GetCharArray', 'centralDogma', 'base');

# Numeric arrays can be passed by reference in a SAFEARRAY using the
# PutFullMatrix and GetFullMatrix functions.
#
# PutFullMatrix([in] BSTR name, [in] BSTR workspace, [in] BSTR data);
# GetFullMatrix([in] BSTR varname, [in] BSTR workspace, [out] BSTR retdata);

$mReal = Variant(VT_ARRAY|VT_R8, 4, 4);
$mImag = Variant(VT_ARRAY|VT_R8, 4, 4);

$mReal->Put([[0,0,0,0], [0,0,0,0], [0,0,0,0], [0,0,0,0]]);
print "\n>> PutFullMatrix( 'magicArray', 'base', ",'$mReal, $mImag'," )\n";
$matlabApp->PutFullMatrix('magicArray', 'base', $mReal, $mImag);
$matlabApp->Execute('magicArray = magic(4)');

$m2Real = Variant(VT_ARRAY|VT_R8|VT_BYREF,4,4);
$m2Imag = Variant(VT_ARRAY|VT_R8|VT_BYREF,4,4);
print "\n>> GetFullMatrix( 'magicArray', 'base', ",'$m2Real, $m2Imag'," )\n";
$matlabApp->GetFullMatrix('magicArray', 'base', $m2Real, $m2Imag);

for ($i = 0; $i < 4; $i++) {
	printf "%3d %3d %3d %3d\n", $m2Real->Get($i,0), $m2Real->Get($i,1),
                                $m2Real->Get($i,2), $m2Real->Get($i,3);
}

# Additionally, you can use Variants to send scalar variables by reference
# to MATLAB for all data types except sparse arrays and function handles through
# PutWorkspaceData:
# PutWorkspaceData([in] BSTR name, [in] BSTR workspace, [in] BSTR data);
#
# Results are passed back to Perl directly with GetVariable:
# HRESULT = GetVariable([in] BSTR Name, [in] BSTR Workspace); 
   
# Create and initialize a date Variant.
$dnaDate = Variant->new(VT_DATE|VT_BYREF, 'Feb 28, 1953');

&send_to_matlab('PutWorkspaceData', 'dnaDate', 'base', $dnaDate);
&send_to_matlab('Execute', 'dnaDate');

# Create and initialize a new string Variant.
$aminoString = Variant->new(VT_BSTR|VT_BYREF, 'matlap');
&send_to_matlab('PutWorkspaceData', 'aminoAcids', 'base', $aminoString);

# Change the value in MATLAB
&send_to_matlab('Execute', "aminoAcids = 'ARNDCQEGHILKMFPSTWYV';");

# Bring the new value back
$aa = $matlabApp->GetVariable('aminoAcids', 'base');
printf "Amino acid codes: %s\n", $aa;

undef $matlabApp; # close Matlab if we opened it

Protein Analysis Tools in the Bioinformatics Toolbox™

MATLAB offers additional tools for protein analysis and further research with these proteins. For example, to access the sequences and run a full Smith-Waterman alignment on the tyrosine kinase domain of the human insulin receptor (pdb 1IRK) and the kinase domain of the human lymphocyte kinase (pdb 3LCK), load the sequence data:

IRK = pdbread('pdb1irk.ent');
LCK = pdbread('pdb3lck.ent');

% Run these commands to bring the data from the Internet:
% IRK = getpdb('1IRK');
% LCK = getpdb('3LCK');

Now perform a local alignment with the Smith-Waterman algorithm. MATLAB uses BLOSUM 50 as the default scoring matrix for AA strings with a gap penalty of 8. Of course, you can change any of these parameters.

[Score, Alignment] = swalign(IRK, LCK, 'showscore', true);

showalignment(Alignment);

MATLAB and the Bioinformatics Toolbox™ offer additional tools for investigating nucleotide and amino acid sequences. For example, pdbdistplot displays the distances between atoms and amino acids in a PDB structure, while ramachandran generates a plot of the torsion angle PHI and the torsion angle PSI of the protein sequence. The toolbox function proteinplot provides a graphical user interface (GUI) to easily import sequences and plot various properties such as hydrophobicity.

Suggest an enhancement for this example.