A Cluster with Clues Pointing to Replication and/or Fatty Acid Synthesis

The Connection to Replication

To begin, let us just focus for a bit on a gene called PA2961 in Pseudomonas aeruginosa PAO1. The function of this gene is DNA polymerase III, delta' subunit (EC 2.7.7.7). If we just look at the "functional coupling scores" (which loosely measure the number of co-occurrences of two genes in phylogeneically diverse genomes), we see that it appears to be most closely related to genes annotated as The function COG1559 protein yceG like really means just
"this gene looks like something most people call COG1559, but we doubt that it has the same function as most of the other genes called COG1559. In any event, we do not know what the function is, just as we do not know what the function of the other genes called COG1559."

We now have within our collection of complete genomes 40-50 instances in which orthologs of PA2961 appear to cluster on the chromosome with genes annotated as having those two functions. I would encourage you to pursue the functional coupling links to verify the degree to which these three sets of orthlogs co-occur. If you just ask for the "pins" corresponding to PA2961, you will see this diagram showing the co-occurrences of the corresponding genes. Note that the genes corresponding

The level of co-occurrence of these four genes makes it extremely likely that they are functionally related.
The DNA polymerase III, delta' subunit is part of the clamp loading and unloading machinery (see the 2006 paper by Neuwald for a summary).
The Thymidylate kinase (EC 2.7.4.9) is used to phosphorylate dTMP to dTDP or dUMP to dUDP. I do not see the connection to the DNA polymerase. However, I do think that it might be worth noting that the genome of the Acidianus virus contains 57 genes. In a recent paper paper by Peng, Basta, Häring, Garrett, and Prangishvili the authors note:

"Of the 57 predicted ORFs, only three produced significant matches in public sequence databases with genes encoding a glycosyltransferase, a thymidylate kinase and a protein-primed DNA polymerase."

I would guess that, if we understood the relationship between the thymidylate kinase and the DNA polymerase in this virus, it might shed light on the clusters I am discussing.

The Case of PA2963

Now let us shift focus to PA2963 (yceG in Escherichia coli). We have annotated it as COG1559 protein yceG like, but The gene we have annotated as COG1559 protein yceG like clusters (often) with both the real aminodeoxychorismate lyase and the thymidylate kinase.
The aminodeoxychorismate lyase is normally composed of three subunits: PabA, PabB, and PabC. PabB occurs in the clusters we are discussing.

YceG and Fatty Acid Synthesis

I have mentioned the tie of YceG (or YceG-like) to the DNA Pol III Delta' subunit and the thymidylate kinase. In addition, I have tried to tie it to aminodeoxychorismate (although I doubt that it is a subunit of that enzyme). Hwever, there is another major component of the story. To see this, consider this image. What we see, is that YceG is closely coupled with Co-occurences with these genes clearly couple YceG (PA2963) with fatty acid synthesis. Well, that is the puzzle of YceG (but do not forget YcfH!).
Here is the text of what Andrei Osterman observed when I discussed it with him:

Ross,
I read thru your write-up about clustering of PA2961 and around.
A few quick comments:
I. On connection with DNA replication:

1. In my opinion, the connection of DNA polymerase III and Thymidylate
kinase (EC 2.7.4.9) is totally straightforward. The latter is involved
in supplying (via synthesis and recylcing) of building blocks.  If you
look even at:
http://www.genome.ad.jp/dbget-bin/show_pathway?rn00240+R02094 You will
see that this reaction (ATP + dTMP <=> ADP + dTDP) is ultimately
leading to formation of dTTP that goes straight to DNA synthesis
(hence your DNA polymerase).  (The next step being:

nucleoside-diphosphate kinase (EC 2.7.4.6)
ATP + dTDP = ADP + dTTP

And the previous step:
thymidine kinase (EC 2.7.1.21 )
ATP + thymidine = ADP + dTMP

2. The connection with viruses and phages is very insightful and
noncoincidential. Art some point (~ 1 year ago) with Rob we did a
quick stats of which bacterial genes/subsystems often occur in
phage. It was driven by NAD story, but nonetheless, DNA replication SS
was a clear champion on phages (just by “counts” of phage genome
connectiosn to any given SS):
   
 Subsystem  Counts
 DNA-replication 171  
 Ribonucleotide_reduction 62  
 DNA_Repair_Base_Excision 52  
 Prophage_lysogenic_conversion_modules 41  
 ... 
Folate_Biosynthesis 32 
pyrimidine_conversions 24

As you can see from these old data, not only replication, but
pyrimidine converions (home of 2.7.4.9) is VERY HIGH on the list.
Moreover, not only various DNA polymersases and such, but also these
two enzymes:

- thymidine kinase (EC 2.7.1.21 )
and 
- thymidylate synthase (EC 2.1.1.45)
5,10-methylenetetrahydrofolate + dUMP = dihydrofolate + dTMP

Were among the most popular bacterial enzymes in phages (~25 instances
of each). As you see both enzymes produce dTMP (!!!), am immediate
precursor for 2.7.4.9 on the way to dTTP and DNA synthesis.
Unfortunately in Rob;’s data I could not find phage stats for 2.7.4.9
but in some sense it doesn’t matter. Phage doesn’t have to carry ALL
components of replication machinery, only those that appear
limiting. Therefore if 2.7.4.9 activity of the host is high enough it
may not be on the extant phages as a positively selected feature. See
what I mean?  That in my view tales care of DNA metabolism
connections.

II. On possible connections with aminochorismate and PabA-PabC: Roy is
the best person to assess whether “predicted aminodeoxychorismate
lyase”. I strongly doubt it, but he would know for sure.  If it was
real, or, putting it differently, if any liason with PabA could be
inferred, the only connection I could foresee would be via folate. As
you see from the previous section, Folate biosynthesis SS is high in
phages, and I am sure that the connection is via folate-dependent
thymidylate synthase (EC 2.1.1.45). We may try to elaborate more on
that as long as we can really see a cnnection on the biochemical
leve. To push further PABA
connection, not that aminodeoxychorismate lyase (PA2964), one intruder
between FAS cluster (see below) and DNA metabolsm cluster (with
2.7.4.9) is replaced by another intruder from the same system in
Xhantomonasand Xylella (gene 22 in yoru last image, EC 6.3.5.8). That
should mean something!

III. On possible connections with Fatty Acid Synthesis (FAS): Well. It
is FAS, all of these genes: PA2965-2969. However, I can’t make much of
it.  We know a lot about FAS in Pseudmonas, and tehere is no major
gaps that we could use YceG to fill-in.

IV. YcfH being a putative nuclease fits very nicely in the whole DNA
metabolism related clustering. We may try to think of a particular
role, butr that would take somebody like Dushko. Why don’t we ask him?
It looks like the hwole story is gram-negative (while he is more a
gram-positive guy), but he may have enough general knowledge.

SUMMARY on YceG: What is the most likely functional link? In my
opinion the most likely link is with DNA replication machinery in a
broader sense (including supply if building blocks).  The exact
function – no clue.  The second option and not unrelated is a
connection with PABA synthesis (as goes to folate, and relates to DNA
via Thymidilate among other things.  FAS seems the least likely
connection.


Finally, I enclose an old paper that has a discussion of this cluster
in Xhanthomonas (abstract below). Check it out. Does it help at all?
AO.



FEMS Microbiol Lett. 2000 Dec 1;193(1):129-36.

Characterization of the acyl carrier protein gene and the fab gene locus in
Xanthomonas albilineans.

Huang G, Zhang L, Birch RG.

Department of Botany, The University of Queensland, Brisbane, Qld., Australia.

A genomic region containing the fatty acid biosynthetic (fab) genes was isolated 
from the sugarcane leaf-scald pathogen Xanthomonas albilineans. The order and
predicted products of fabG (beta-ketoacyl reductase), acpP (acyl carrier
protein), fabF (ketoacyl synthase II) and downstream genes in X. albilineans are 
very similar to those in Escherichia coli, with one exception. Sequence analysis,
confirmed by insertional knockout and specific substrate feeding experiments,
shows that the position occupied by pabC (encoding aminodeoxychorismate lyase) in
other bacteria is occupied instead by pabB (encoding aminodeoxychorismate
synthase component I) in X. albilineans. Downstream of pabB, X. albilineans
resumes the arrangement common to characterized Gram-negative bacteria, with
three transcriptionally coupled genes, encoding an ORF340 protein of undefined
function, thymidylate kinase and delta' subunit of DNA polymerase III holoenzyme 
(HolB). Different species may obtain a common advantage from coordinated
regulation of the same biosynthetic pathways using different genes in this
region.

PMID: 11094291