%This is the n SIMULATED ANNEALING VERSION OF THE SIGNATURE MINING ALGORITHM AS PRESENTED IN THE PAPER: %Mining pathway signatures from microarray data and relevant biological %knowledge'' by Eleftherios Panteris , Stephen Swift, Annette Payne,Xiaohui Liu. % What follows is the script that uses the workspace file provided to get the Pathway signatures for the 103 pathways as mentioned in the paper. %please refer to paper for detailed explanation of the algorithm. % The workspace has all the files needed to run the algorithm with different iterations or temperatures . the files used were adapted from the monthly updated KEGG metabolism lists for E.coli K-12. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % 0. Creation of various files. % 0.0. FILTER. DELETES the empty cells and reduce the length of each pathway empty cells to remove the genes not in the 1430 choosen %need to do this in the sig when they still have names SIG is the original file for the 1430 ans should not be SIG=CAN; % we take the original size and reduce it to 1430. for k= 1:100 for i=1: length(SIG) for j=1: length(SIG{i}) if SIG{i}{j}=='0' %if isempty(SIG{i}{j}) SIG{i}(j,:)=[]; break end end end end tempSIG=SIG; % 0.1 SP1. Produces a list of subscripts SP1 for eucldist51 ( with empty % matrices)... to be used in DPA ... % for the 1430 genes that were selected by the threshold 1.3 %%%%%%%%%%%%%5 SP1=SIG; for a=1: length(SIG) for b=1: length(SIG{a}) SP1{a}{b,1}=strmatch(SIG{a}{b},uniq1); end end clear a b SP1p=SP1; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % 0.2 Binary list. BINLIST for random selection % CAUTION it has to be already reduced to the 1430 genes to work correctly % so for example the 1st path is 32 not 45 genes ( as in KEGG)!!!!!!!! BINLIST=SIG; for i=1:length(SIG) for j=1:length(SIG{i}) if isempty(SIG{i}{j}) BINLIST{i}{j}=0; else BINLIST{i}{j}=1; end end end BINLISTtemp=BINLIST; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%5 % 0.3 Makes a cell array of cell arrays of matrices DPA DPA=tempSIG; % this is the changeable copy of SIG that has the random insertions or deletions for i=1:length(BINLIST); for j= 1: length(BINLIST{i}); DPA{i}=zeros(length(BINLIST{i}),length(BINLIST{i})); end end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % 0.4 Gives the distances for each pathway. DPA . SP1 working loop design for distance in all pathways for unique comparisons. a=0; for k= 1:length(SP1p) temp=zeros(length(SP1p{k}),length(SP1p{k})); for i=1:length(SP1p{k}) for j=i+1:length(SP1p{k}) A=SP1p{k}{i}(1); B=SP1p{k}{j}(1); if ( A==0 | B==0) temp(i,j)=0; else a=eucldist51(SP1p{k}{i}(1),SP1p{k}{j}(1)); if a==0 temp(i,j)=[ eucldist51(SP1p{k}{j}(1),SP1p{k}{i}(1))]; else % this is the reverse to get unique results( in the eucld matrix x,y is same with y,x) temp(i,j)=[ eucldist51(SP1p{k}{i}(1),SP1p{k}{j}(1))]; end end end end DPA{k}=temp; end tic %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % ALGORITHM. % 1. ITERATIONS T=8467; % temperature See relevant equations and discussion in paper e=0.001; ITER=1000% set Number of repeats w=0; while w<= ITER w= w+1; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % 2. Randomisation of the BINLIST rand('state',sum(100*clock)) % changes the seed each iteration!!! for i=1 % iterations this should be the same as the major iterations of the algorithm m = unidrnd(length(BINLIST)); n=unidrnd(length(BINLIST{m})); if isnan(n) break end %%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% if BINLIST{m}{n}==0 BINLIST{m}{n}=1; SP1p{m}{n}=SP1{m}{n}; else BINLIST{m}{n}=0; SP1p{m}{n}= 0; end end %%%%%%%%%%%%%%%%%%%%%%% %3. updates the distance matrix DPA with the randomly chosen and changed pathway. a=0; for k= m temp=zeros(length(SP1p{k}),length(SP1p{k})); for i=1:length(SP1p{k}) for j=i+1:length(SP1p{k}) A=SP1p{k}{i}(1); B=SP1p{k}{j}(1); if ( A==0 | B==0) temp(i,j)=0; else a=eucldist51(SP1p{k}{i}(1),SP1p{k}{j}(1)); if a==0 temp(i,j)=[ eucldist51(SP1p{k}{j}(1),SP1p{k}{i}(1))]; else % this is the reverse to get unique results( in the eucld matrix x,y is same with y,x) temp(i,j)=[ eucldist51(SP1p{k}{i}(1),SP1p{k}{j}(1))]; end end end end DPA{k}=temp; end % %%%%%%%%%%%%%%%%%%% % MM=max(MAX); DD=min(MID); AAA=0; Feval(w)=0; Fe=zeros(size(DPA)); for i= 1: length(BINLIST) AA=sum(DPA{i}(:)); BB= ( MM+ DD) / 2; if isempty(BB) BB=0; end AAA=AAA + AA; Fe(i)=BB-AAA; end Feval(w)=Feval(w)+Fe(i); % can be ploted to see the convergence of the algorithm.%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %4. Evaluation. ta(w)=0; fe(w)=0; for T=T*((e/1000)^(1/ITER)); % this is the 'decay' of temperature to provide a geometric rather thatn a arithmetic slope see paper for reference ta(w)=ta(w)+ T; R=unifrnd(0.0,1); Q=exp((-(Fevalorig-Feval(w)))/T); if Feval(w) < Fevalorig fe(w)=Feval(w); if R Fevalorig fe(w)=0; BINLISTtemp=BINLIST; Fevalorig=Feval(w); else BINLIST=BINLISTtemp; Feval(w)=Fevalorig; end end end % 5.1 to compile the tempSIG list of gene names for i= 1:length(BINLIST) for j= 1:length(BINLIST{i}) if BINLIST{i}{j}==0 tempSIG{i}{j} = '0'; end if BINLIST{i}{j}==1 tempSIG{i}{j}=SIG{i}{j} ; end end end F2tempSIG=zeros(103,1); for i= 1:length(F2tempSIG) F2tempSIG(i)=length(BINLIST{i}); end %%%%%%%%%%%%%%%%%%% for i= 1: length(BINLIST) for j= 1: length(BINLIST{i}) if BINLIST{i}{j}==0 % changed F2tempSIG(i)=F2tempSIG(i)-1; else F2tempSIG(i)=F2tempSIG(i); end end end % 6. compiles the SP1p with the update list of gene names for i= 1:length(BINLIST) for j= 1:length(BINLIST{i}) if BINLIST{i}{j}==0 SP1p{i}{j} = 0; end if BINLIST{i}{j}==1 SP1p{i}{j}=SP1{i}{j} ; end end end clear i j x y toc %THE FILE SIG CONTAINS THE MINNED GENES FOR EACH PATHWAY WE EXPALIN HOW WE %GET THE PATHWAY SIGNATURES FROM THEM INTHE PAPER. %SINCE ITS A SEARCH ALGORITHM THE LONGER IT RUNS THE BETTER THE RESULTS. %IF USED WITH OWN DATA PLEASE BEAR IN MIND THE QUALITY OF THE MICROARRAY %DATA WILL SERIOUSLY AFFECT THE END RESULT.