Description

In this project, you will use phylogenetic analysis with data from the case of the Florida dentist who
infected his patients with HIV

1) The scientific paper on which the analysis is based. You should read this to familiarize yourself
with the case and with the data. I would recommend that you also google the case to find news
reports about it.

2) A data set FloridaDentistHIV_Data2.txt. This contains information about the DNA sequences
used in the study, including the patient that they came from. “dentist” refers to the dentist
himself, “pA” to “pH” refer to HIV-infected patients of the dentist, and values beginning with
“LC” refer to local controls.

3) A data set hiv-db_gap_squeeze_120.FASTA that contains the sequence information in FASTA
format. This is ordered by patients the same as in FloridaDentistHIV_Data2.txt.

Your goal is to fit a phylogenetic tree to this data and verify the conclusion that five of the eight HIV-
positive patients were infected by the dentist. Write a short (maybe 5 pages) report that includes the
following:

a) Introduces the case and explains the background of the case

b) Explain the data

c) Explain your approach for analyzing the data.

d) Figures showing the results.

e) An explanation of how the results support the conclusion that the dentist was the source if the
infection for five patients.

f) A discussion of the assumptions of your analysis and how these impact the validity of the
results.



Notes:

1) The initial tree that you get from the data will be a jumble that is impossible to read. It is

required that you produce a nice looking figure that shows the relationship between the

different patients in an easily interpretable fashion.

2) Producing the interpretable figure will be one of the hardest parts of the assignment. Strategies
that you can take include (not necessarily an exhaustive list) changing the tip labels on the tree,
shrinking the size of the tip labels, color coding the tip labels, or strategically reducing the
number of sequences in the tree. Making a good figure to represent data in just the right way is
incredibly important and often takes a lot of effort. You should start now in developing the
habit of putting lots of effort into your figures. You must make nice looking and readable

figures.

3) You should include bootstrapping in your results, along with a discussion of whether this affects
your confidence in the conclusions.

4) Your report should have separate sections of introduction, methods, results, discussion (or
equivalent). A significant portion of the points will depend on writing clear and thorough report.