-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- W I N A C++ Version 0.34 A Window Analysis Program for the Number of Synonymous and Nonsynonymous Nucleotide Substitutions. Toshinori Endo* and Takashi Gojobori Center for Information Biology National Institute of Genetics -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- *Current affiliation: Hokkaido University (This document was first written on October 7, 1996.) (Last modified on August 24,2004) ** IMPORTANT NOTICE FROM THE AUTHORS Please cite our paper below when you publish a paper with WINA. Endo,T., Ikeo,K., Gojobori,T. (1996) Large-Scale Search for Genes on Which Positive Selection May Operate. Mol. Biol. Evol. 13:685-690. ** SYSTEM REQUREMENTS o Linux OS o GCC (GNU Compiler Collection) 3.3.1 o GNU make o GNU tar ** INTRODUCTION The window analysis is a method to estimate the region where a trace of some effects can be seen. This package is designed to visualize the regional difference in accumulation of both synonymous and nonsynonymous nucleotide substitutions. The method used to estimate the numbers (ds,dn) of the synonymous and nonsynonymous substitutions is Nei and Gojobori's method (1986). The main program `wina' in this package is designed for UNIX system. The data converter `wp' for postscript output is written with perl. See Endo et al. (1996) for details in the window analysis. * QUICK START The simplest way to use this package is as follows: 1. Unpack the package (INSTALL UNPACKING section) 2. Compile the program (COMPILE section) 3. To obtain the window analysis of ds and dn, just type as follows: % wina INPUT.aln > OUTPUT INPUT.aln is a alignment file formatted as the sample file included in this package. See INPUT FORMAT section for detail. If you wish to get a postscript output, type as follows: % wina INPUT.aln | wp > OUTPUT.ps or % wina INPUT.aln | wp | lp The former will produce only a postscript output file while the latter will print out through the default postscript printer. * THE FILES CONSTITUTE OF THIS PACKAGE Please confirm the followings are included: CodonTab.h - C++ header file for codon table SynDiffClass.h - C++ header file for the class for nucleotide difference table common.h - C++ header file for common functions Makefile - Build infomation synsite.h - C++ header file for synonymous site table test.aln - sample alignment wina.cc - main program source (C++) of window analysis wina.doc - this document file wp - the data converter for postscript output (Perl) totables - converts the output of wina to table form. It helps you to get averages of data with spread sheet software, such as Excel. In case anything is missing, please contact the author ;-) ** INSTALL * UNPACKING Change the current directory to an appropriate location and type % gunzip wina-0.33.tar.gz | tar xvf - In case of GNU tar, you can do as following instead: % tar xzvf wina-0.33.tar.gz then you will find a new directory `wina-0.33,' which contains all of the file in this package. * BUILD Just type % make If you use systems other than Linux, you may need to specify the compiler as follows: % make CXX=g++ ** USAGE wina can take both standard input and text files. For example, % wina file1 file2 .. % wina < file wp takes the output of wina and convert it into a postscript file. For example, % wp wina_output .. % wp < wina_output If you want to obtain a graphic output from a printer, type % wina file | wp | lp Be careful if you want to print a result of the window analysis of an alignment that is constituted from more than a few sequences, because current version of wina outputs ds and dn for all the possible pairs of sequences in the alignment. * INPUT FORMAT Input file format is sequence alignment. The recognizable characters for nucleotides are A,C,G,T and U. The other alphabets and letters for gaps ( * (asterisk), - (hyphen), .(period)) are also acceptable for the sequence, but the corresponding codon that contain such a character will be ignored for all the estimation process. Example: See file `test.aln' included in this package. * OUTPUT FORMAT Output format of wina is as follows: site : ds dn mark s* mark indicates as follows **: dn > 2ds, dn <= 1.0; ++: dn > 2ds, dn > 1.0; * : dn > ds, dn <= 1.0; + : dn > ds, dn > 1.0; s# indicates the number of site compared within this window. Example: >AGMGIBSC1-1 x HUMTGFB2A-1 1 : -0.0000 -0.0000 33 4 : -0.0000 -0.0000 36 7 : -0.0000 -0.0000 39 10 : -0.0000 -0.0000 42 13 : -0.0000 -0.0000 42 16 : -0.0000 -0.0000 42 19 : -0.0000 -0.0000 42 22 : -0.0000 -0.0000 42 25 : -0.0000 -0.0000 42 : // ** BUGS You may find 'Inf' or 'NaN' as the values of ds and ds. They abbreviates: Inf : Infinite and NaN : Not a number caused by too large value of differenct and zero division. Because of the problem above, some compiler gives some error messages, such as Digital C++ (Thanks to Yossi Glass for the Information). ** QUESTIONS AND BUG REPORTS I hope there should be no bug in this program but there may be. Please contact to tendo@lab.nig.ac.jp (Toshinori Endo) if you find any problems and troubles in using this software package. Thank you for your cooperation. ** HISTORY 2004.8.24 wina 0.34 Update for new C++ specification with some fixes. 1999.12.1 wina 0.32 Update for new C++ specification w/ fix for zero-division. 1997.5.7 wina 0.31 Bug fixed for sequence initialization in constructor Sequence::Sequence(). Thanks for the bug report from Yossi Glass. 1996.10.7 wina document was written 1996.8.23 wina 0.3 First release version of wina 1995.3.11 wina 0.1 C++ version of window analysis program 1993.7.20 wana - previous version of window analysis program written with C and perl. ** REFERENCES Nei,M. and Gojobori,T. (1986) Mol. Biol. Evol. 3:418-426. Endo,T., Ikeo,K., Gojobori,T. (1996) Large-Scale Search for Genes on Which Positive Selection May Operate. Mol. Biol. Evol. 13:685-690.