Bioinformatics - Lab 6
Phylogenetic Trees
David Gilbert

The aim of this lab is to give you practical experience in the concepts from the lecture on phylogenetic trees.

Software

Polytree:

Phylip:

Blast:


Exercises

(A) Identify the phylogeny of Globin Sequences (Beta-Chain).

Using Clustalw/Phylip to compute distances/matrices: step by step.

  1. Download the globin sequences (amino-acids, in fasta format) from the course website
  2. Use clustalw as either the command-line version or the web-based version (see Lab 4).
    E.g. (web-based):
  3. Go to the directory where that you have saved the input file.
  4. Calculate the distance matrix for the input file. In your terminal, invoke the protdist program by typing the command
    protdist
  5. Enter the input file name (of your alignments from clustalw) to the console. E.g. 'globins_all.phy'
  6. protdist outputs an outfile that calculates the distance matrix.
  7. Change the name for this outfile (e.g. mv outfile outfile.matrix)
  8. Now we will produce neighbour-joining trees for the input sequences using neighbor. Invoke the neighbor program by typing the command
    neighbor
  9. Enter the matrix file name to the console. (e.g. outfile.matrix)
  10. You can choose to produce a Neighbour-joining tree or a UPGMA tree from the console.
  11. neighbor outputs two different files, the outfile (tree visualisation) and the treefile (distances).
  12. You can plot an unrooted tree (for neighbour-joining) using
    drawtree
    or a rooted tree (for UPGMA) using
    drawgram
    which will require your treefile
    Note that this program will also ask you for a 'fontfile'; these are
    /users/students4/software/public/Bio4/bin/phylip3.65/font1
    /users/students4/software/public/Bio4/bin/phylip3.65/font2
    up to font6
  13. You can try to generate trees using the fitch or kitsch programs from the phylip package. You can plot these using either the drawtree or drawgram programs.

(B) Origin and evolution of HIV

Data: this can be either

  • Some HIV nucleotide sequences that you can investigate are here, in fasta format. These sequences are from HIV subtype C viruses from South Africa (C.ZA sequences) and from India (C.IN sequences). (from http://www.sanbi.ac.za/mrc/tdr2003/material/phylo_tut.2.html) OR

  • FASTA files containing the amino acid sequences for the env, gag and pol proteins from the isolates in the list below. (from http://artedi.ebc.uu.se/course/UGSBR/hiv/)

     

    no

    Isolate

    Accession no

    Subtype/animal

    1

    HIV-1ELI

    K03454

    D

    2

    HIV-1LAI

    K02013

    B

    3

    HIV-1NDK

    M27323

    D

    4

    HIV-2D205

    X61240

    B

    5

    HIV-2ROD

    M15390

    A

    6

    HIV-2ST

    M31113

    A

    7

    HIV-2UCI

    L07625

    B

    8

    SIVmac

    M19499

    macaque

    9

    SIVcpz

    X52154

    chimpanzee

    10

    SIVagm

    M58410

    African green monkey

    11

    SIVman

    X14307

    mangabey

     

    1. Using the same procedure as (A), perform a multiple alignment using clustalw, with output in phylip format.
    2. Using the appropriate programs from the phylip package:
      1. Construct the distance matrix this time using dnadist (why?),
      2. generate the tree and
      3. visualise the tree

    (C) Origin and evolution of H5N1

    Try to construct phylogenetic trees from some examples of the H5N1 virus.