Home  Create TSL  Examples  Help  
OptionsPositive and negative sampleTwo Sample Logo calculates statistical significance of the relative positionspecific symbol frequencies between two sets of aligned sequences. For example, sequences that are known to share a sequence motif may be locally aligned including positions upstream or downstream from the motif. All aligned sequences in both samples are required to be of the same length, so dash characters ("") should be used to pad the positions in case some sequences are shorter. Sequences that contain a motif and at the same time have a certain functional property (say, protein modification sites or transcription factor binding regions) constitute a positive sample. Sequences that contain the motif and at the same time do not have the functional property constitute the negative sample. The distinction between the samples does not necessarily have to be based on the presence and absence of a functional property: as long as there is a clear way of interpreting the data, any pair of sets of aligned sequences can be used as positive and negative. Sequences can be entered as flat files, or in FASTA or ClustalW formats. Sequence typeEither amino acid or nucleotide. If amino acid option is selected, all symbols other than the standard 20 amino acid singleletter codes will be replaced with dashes and will not be a part of the statistics. Likewise, if nucleotide option is selected, all symbols other than the A, C, G, T, and U will be replaced with dashes. Statistical testsTwo Sample Logo supports two types of statistical tests:
Frequently used statistical procedure that tests whether two samples were generated by the same Gaussian distribution. The assumptions of the ttest are that all observations are independent and that the standard deviations for both samples are identical, then it checks the equality of means (Hogg and Craig, 1994). Binomial testConsider two 01 samples S_{1} and S_{2} of sizes n_{1} and n_{2} respectively, in which symbol 1 occurred k_{1} times in S_{1} and k_{2} times in S_{2}. Let us also assume that the test statistic is the absolute difference of symbol’s relative frequencies, i.e. θ = k_{1}/n_{1} – k_{2}/n_{2}. The binomial test calculates the probability that a difference ≥θ for the two samples of sizes n1 and n2 randomly drawn from the underlying null distribution could occur by chance alone. Since, according to the null model, both samples are independent and identically distributed, an unbiased estimate of the probability of success p of the underlying binomial distribution is calculated as the relative frequency of occurrence of a symbol when S_{1} and S_{2} are concatenated, i.e. p = (k_{1} + k_{2})/(n_{1} + n_{2}). The achieved significance level P of the null hypothesis is then the probability that the difference ≥θ will be observed between the estimated success probabilities in the two samples of sizes n_{1} and n_{2} randomly drawn from the underlying distribution. It is calculated as: PvaluePvalue is defined as the lowest significance level at which the null hypothesis can be rejected. In the case of two sample logos, null hypothesis assumes that each symbol at each position in both samples is generated according to the same probability distribution. Based on the null hypothesis, pvalue is calculated as the probability that the test statistic as extreme or more extreme than in the original samples can occur by chance alone. Here, the test statistic is the absolute value of the difference in relative frequencies between positive and negative samples. Since in most cases this probability cannot be calculated exactly, pvalue is only approximated. Show conserved residuesBecause conserved motifs will not be enriched nor depleted in the positive sample in comparison to the negative sample (the difference of their relative frequencies will be zero), by default they will not be displayed in the logo. Checking this option forces the software to show conserved residues. Fixed height symbolsWhen this option is checked, all enriched and depleted symbols will have the same height. When it is not checked, the height of the symbols will be proportional to the difference of relative frequencies of corresponding residues at a given position in the positive and negative sample. Bonferroni correctionA correction of the pvalue in cases when multiple dependent or independent hypotheses are tested. See (Weisstein) for details. Advanced optionsTitleSets up the title for the two sequence logo. Logo rangeLimits the analysis to the specified colums in the samples of aligned sequences. First position indexIndex assigned to the first symbol in the logo. For example, if the sample is a 25 residuelong window centered around an active site, first position symbol should be 12: then the active site will have index 0, and the last symbol will be indexed as +12. The default value is 1. Show Xaxis indexesShow residue indexes on the Xaxis. Show Yaxis labelsShows labels "enriched" and "depleted" next to the Yaxis. Output optionsOutput formatTwo Sample Logo supports Encapsulated PostScript (EPS), Portable Document Format (PDF), Graphics Interchange Format (GIF) and Portable Network Graphics (PNG). Output sizeHeight and width of the output image, in pixels, centimeters or inches. ResolutionSets up the image resolution. Applicable to bitmaps only (GIF and PNG). AntialiasingTurns antialiasing on or off. Boxed imageIf this option is checked, letters in the output will be inscribed in bounding boxes. Outlined symbolsIf this option is checked, letters in the output will be only outlined (and not filled). Color schemesBlack and whiteAll symbols are written in black type against a white background. WebLogo default colors
Shapley color table for amino acids
In the original Shapley scheme, G and V were colorcoded as white. Since this would render them invisible against a white background, their color has been changed to light grey. Shapley color table for nucleotides
Amino Colors
ChargePositively charged residues (K, R, H) are colored blue, and negatively charged residues (D, E) are colored red; all neutral residues are colored black. HydrophobicityHydrophobic residues (A, F, G, I, L, P, V, W, Y) are cyan colored, while the remaining hydrophilic residues are colored black. This classification was based on (Eisenberg, 1984). Surface exposureSurface exposed residues (D, E, H, K, N, P, Q, R, S, T, Y) are colored orange, and burried residues (A, C, F, G, I, L, M, V, W) are colored black. This classification was based on (Janin, 1979). FlexibilityHigh flexibility residues (D, E, K, N, P, Q, R, S) are colored red, whereas low flexibility residues (A, C, F, G, H, I, L, M, T, V, W, Y) are colored green. This classification was based on (Vihinen et al., 1994). DisorderDisorderpromoting residues (A, R, S, Q, E, G, K, P) are colored red, orderpromoting residues (N, C, I, L, F, W, Y, V) are colored blue, and disorderorder neutral residues (D, H, M, T) are colored black. This classification was based on (Dunker et al., 2001). User defined color schemeThis option allows you to specify a new color mapping using the set of standard predefined colors listed in the following table:
Any symbol not explicitly assigned to a color will default to black. References
