Main Content

blastread

Read data from NCBI BLAST report file

Description

example

blastdata = blastread(blastreport) reads the NCBI BLAST report data from an XML-formatted file, blastreport, and returns blastdata, a structure containing the corresponding BLAST data.

Examples

collapse all

Perform a BLAST search on a protein sequence and save the results to an XML file.

Get a sequence from the Protein Data Bank and create a MATLAB structure.

S = getpdb('1CIV');

Use the structure as input for the BLAST search with a significance threshold of 1e-10. The first output is the request ID, and the second output is the estimated time (in minutes) until the search is completed.

[RID1,ROTE] = blastncbi(S,'blastp','expect',1e-10);

Get the search results from the report. You can save the XML-formatted report to a file for an offline access. Use ROTE as the wait time to retrieve the results.

report1 = getblast(RID1,'WaitTime',ROTE,'ToFile','1CIV_report.xml')
Blast results are not available yet. Please wait ...

report1 = 

  struct with fields:

                RID: 'R49TJMCF014'
          Algorithm: 'BLASTP 2.6.1+'
           Database: 'nr'
            QueryID: 'Query_224139'
    QueryDefinition: 'unnamed protein product'
               Hits: [1×100 struct]
         Parameters: [1×1 struct]
         Statistics: [1×1 struct]

Use blastread to read BLAST data from the XML-formatted BLAST report file.

blastdata = blastread('1CIV_report.xml')
blastdata = 

  struct with fields:

                RID: ''
          Algorithm: 'BLASTP 2.6.1+'
           Database: 'nr'
            QueryID: 'Query_224139'
    QueryDefinition: 'unnamed protein product'
               Hits: [1×100 struct]
         Parameters: [1×1 struct]
         Statistics: [1×1 struct]

Alternatively, run the BLAST search with an NCBI accession number.

RID2 = blastncbi('AAA59174','blastp','expect',1e-10)
RID2 =

    'R49WAPMH014'

Get the search results from the report.

report2 = getblast(RID2)
Blast results are not available yet. Please wait ...

report2 = 

  struct with fields:

                RID: 'R49WAPMH014'
          Algorithm: 'BLASTP 2.6.1+'
           Database: 'nr'
            QueryID: 'AAA59174.1'
    QueryDefinition: 'insulin receptor precursor [Homo sapiens]'
               Hits: [1×100 struct]
         Parameters: [1×1 struct]
         Statistics: [1×1 struct]

Input Arguments

collapse all

Name of an XML-formatted BLAST report file, specified as a character vector or string.

Example: 'blastreport.xml'

Output Arguments

collapse all

BLAST report data, returned as a structure that contains the following fields:

FieldDescription
RIDRequest ID for retrieving results from a specific NCBI BLAST search
AlgorithmNCBI algorithm used to perform the BLAST search
DatabaseAll databases searched
QueryIDIdentifier of the query sequence
QueryDefinitionDefinition of the query sequence
HitsStructure containing information on the hit sequences, such as IDs, accession numbers, lengths, and HSPs (high-scoring segment pairs)
ParametersStructure containing information on the input parameters used to perform the search
StatisticsSummary of statistical details about the performed search, such as lambda, kappa, and entropy values

More About

collapse all

Hits

This table lists each field of blastdata.Hits.

FieldDescription
IDID of the subject sequence that matched the query sequence
DefinitionDescription of the subject sequence
AccessionAccession of the subject sequence
LengthLength of the subject sequence
HspsStructure containing Information on the high-scoring segment pairs (HSPs)

Hits.Hsps

This table summarizes the fields of Hits.Hsps.

FieldDescription
ScorePairwise alignment score for a high-scoring segment pair between the query sequence and a subject sequence.
BitScoreBit score for a high-scoring segment pair.
ExpectExpectation value for a high-scoring segment pair.
IdentitiesNumber of identical or similar residues for a high-scoring segment pair between the query sequence and a subject sequence.
PositivesNumber of identical or similar residues for a high-scoring sequence pair between the query sequence and a subject amino acid sequence. This field applies only to translated nucleotide or amino acid query sequences and databases.
GapsNonaligned residues for a high-scoring segment pair.
AlignmentLengthLength of the alignment for a high-scoring segment pair.
QueryIndicesIndices of the query sequence residue positions for a high-scoring segment pair.
SubjectIndicesIndices of the subject sequence residue positions for a high-scoring segment pair.
FrameReading frame of the translated nucleotide sequence for a high-scoring segment pair.
Alignment3-by-N character array showing the alignment for a high-scoring sequence pair between the query sequence and a subject sequence. The first row is the query sequence, the second row is the alignment, and the third row is the subject sequence.

Version History

Introduced before R2006a