Parker Method for Prediction of surface sites

- February 08, 2013

Parker et al. has produced following hydrophilicity scale derived from High-Performance Liquid Chromatography (HPLC) Peptide Retention Data.

Surface Profile: The surface profile for a protein was determined by summing the parameters for each residue of seven-residue segment and assigning this sum to the fourth residue. This procedure was repeated by shifting the segment by one residue from the N-to the C-terminus. A plot of these values against the residue number using either HPLC, accessiblity, bulk hydrophobic character, hydrophobicity, global, pi, or hydropathy parameters (Here only HPLC is used) provided a surface profile.

To objectively interpret all of these profiles, Parker et al. has used following arbitary of rules:

1) The average surface hydrophilicity is defined as the the mean of the profile values for a protein using a particular parameter set.
2) Any residue with a profile value greater than 25% above the average surface hydrophilicity value were defined as surface sites.

Program:

Parker.pl

use strict;
use warnings;
use FindBin qw($Bin);
use lib "$Bin";
use ParkerModule;

my %protein = ("A"=>{"Code"=>"Ala","Params"=>2.1},
               "R"=>{"Code"=>"Arg","Params"=>4.2},
               "N"=>{"Code"=>"Asn","Params"=>7.0},
               "D"=>{"Code"=>"Asp","Params"=>10.0},
               "C"=>{"Code"=>"Cys","Params"=>1.4},
               "E"=>{"Code"=>"Glu","Params"=>7.8},
               "Q"=>{"Code"=>"Gln","Params"=>6.0},
               "G"=>{"Code"=>"Gly","Params"=>5.7},
               "H"=>{"Code"=>"His","Params"=>2.1},
               "I"=>{"Code"=>"Ile","Params"=>-8.0},
               "L"=>{"Code"=>"Leu","Params"=>-9.2},
               "K"=>{"Code"=>"Lys","Params"=>5.7},
               "M"=>{"Code"=>"Met","Params"=>-4.2},
               "F"=>{"Code"=>"Phe","Params"=>-9.2},
               "P"=>{"Code"=>"Pro","Params"=>2.1},
               "S"=>{"Code"=>"Ser","Params"=>6.5},
               "T"=>{"Code"=>"Thr","Params"=>5.2},
               "W"=>{"Code"=>"Trp","Params"=>-10.0},
               "Y"=>{"Code"=>"Tyr","Params"=>-1.9},
               "V"=>{"Code"=>"Val","Params"=>-3.7},
              );
my $sequence = "MAFSAEDVLK EYDRRRRMEA LLLSLYYPND RKLLDYKEWS PPRVQVECPK
APVEWNNPPS KGLIVGHFS GIKYKGEKAQ ASEVDVNKMC CWVSKFKDAM
RRYQGIQTCK IPGKVLSDLD AKIKAYNLTV EGVEGFVRYS RVTKQHVAAF
LKELRHSKQY ENVNLIHYIL TDKRVDIQHL EKDLVKDFKA LVESAHRMRQ 
GHMINVKYIL YQLLKKHGHG PDGPDILTVK TGSKGVLYDD SFRKIYTDLG
WKFTPL";
sub clean_input_sequence {
 my $seq = shift;
 $seq =~s/\n//g;
        $seq =~s/ +//g;
 return $seq;
}

my $seq = clean_input_sequence($sequence);
my $windowlength = 7;

my $parker = ParkerModule->new(\%protein, $seq, $windowlength);
$parker->display_object();
my %seqplot = $parker->plotgraph;
sub asc_sort_subject {
 $a<=>$b;
}
my $key;
foreach $key(sort asc_sort_subject(keys %seqplot)){
   print "$key\t$seqplot{$key}{\"code\"}\t$seqplot{$key}
                {\"assignedvalue\"}\n"; 
}

ParkerModule.pm

package ParkerModule;
use strict;
use warnings;
use Math::BigFloat;

sub new {
 my $class = shift;
 my ($parameters,$sequence,$windowlength) =@_;
 my $ref={
  "Parameters"=>$parameters,
  "Sequence"=>$sequence,
  "Windowlength"=>$windowlength,
 };
 bless($ref,$class);
 return $ref;
}

sub display_object {
 my $self = shift;
 #print "Parameters= ".$$self{"Parameters"};
 my $param = $$self{"Parameters"};
 #print $$param{"A"}{"Params"},"\n";
 #print "Sequence= ".$$self{"Sequence"};
 #print "Windowlength= ".$$self{"Windowlength"};
  
}

sub plotgraph {
 my $self = shift;
 my $sequence = $$self{"Sequence"};
 my @seq = split("",$sequence);
 my $param = $$self{"Parameters"};
 my %seqplot;
 
 my $length = @seq;
 my $windowlength = $$self{"Windowlength"};
 my $meanposition = int($windowlength/2);
 for(my $i=0;$i<= $length-$windowlength; $i++ ){
  my $sum = 0;
  my $s = Math::BigFloat->new($sum);
  for(my $j = $i; $j< $i+ $windowlength; $j++){
   $s = $s + $$param{"$seq[$j]"}{"Params"};
   
  }
  
  my $k = $i+$meanposition;
  $seqplot{"$k"}{"assignedvalue"} = $s;
  $seqplot{"$k"}{"code"}= $seq[$k];
 }
 
 return %seqplot;
}

1;

References:
New Hydrophilicity Scale Derived from High-Performance Liquid Chromatograpy Peptide Retention Data: Correlation of Predicted Surface Residues with Antigenicity and X-ray-Derived Accessible Sites
J.M.R.Parker, D.Guo, and R.S.Hodges

Please go through the program and reference, and suggest error and improvement of the program.

Search This Blog

Code snippets

Parker Method for Prediction of surface sites

Comments

Post a Comment

Popular posts from this blog

App with fastapi, Arangodb and Graphql

Auto Cross covariance Python

Generating UUID4 from custom random number generator (RNG) using rust