Sunday, January 27, 2008

Mitochondrial DNA M1 haplogroup: A Response To Ana M. Gonzalez et al. 2007

Ana M. Gonzalez et al. published a paper on M1 expansions, 9 July 2007, and a few things about it immediately jumped at the present author. The present author lays these out shortly following the abstract below, which is there to put potential viewers of this page on "the same page" so to speak, as far as the synopsis of the paper is concerned:

Abstract:

Mitochondrial lineage M1 traces an early human backflow to Africa

Ana M Gonzalez , Jose M Larruga , Khaled K Abu-Amero , Yufei Shi , Jose Pestano and Vicente M Cabrera

BMC Genomics 2007, 8:223 doi:10.1186/1471-2164-8-223

Published 9 July 2007

Abstract (provisional)

The complete article is available as a provisional PDF. The fully formatted PDF and HTML versions are in production.

Background
The out of Africa hypothesis has gained generalized consensus. However, many specific questions remain unsettled. To know whether the two M and N macrohaplogroups that colonized Eurasia were already present in Africa before the exit is puzzling. It has been proposed that the east African clade M1 supports a single origin of haplogroup M in Africa. To test the validity of that hypothesis, the phylogeographic analysis of 13 complete mitochondrial DNA (mtDNA) sequences and 261 partial sequences belonging to haplogroup M1 was carried out.

Results
The coalescence age of the African haplogroup M1 is younger than those for other M Asiatic clades. In contradiction to the hypothesis of an eastern Africa origin for modern human expansions out of Africa, the most ancestral M1 lineages have been found in Northwest Africa and in the Near East, instead of in East Africa. The M1 geographic distribution and the relative ages of its different subclades clearly correlate with those of haplogroup U6, for which an Eurasian ancestor has been demonstrated.

Conclusions
This study provides evidence that M1, or its ancestor, had an Asiatic origin. The earliest M1 expansion into Africa occurred in northwestern instead of eastern areas; this early spread reached the Iberian Peninsula even affecting the Basques. The majority of the M1a lineages found outside and inside Africa had a more recent eastern Africa origin. Both western and eastern M1 lineages participated in the Neolithic colonization of the Sahara. The striking parallelism between subclade ages and geographic distribution of M1 and its North African U6 counterpart strongly reinforces this scenario. Finally, a relevant fraction of M1a lineages present today in the European Continent and nearby islands possibly had a Jewish instead of the commonly proposed Arab/Berber maternal ascendance.

-Abstract ends-

Present Author's Response To Ana M. Gonzalez et al.

*First, a quick synopsis of the samplings, with regards to where the n=261 M1 bearing samples come from, aside from the 588 participants mentioned in one of the tables [table 2] in the study:

From the present author's assessment of the table, it comes from the following numbers:

A total of 50 Europeans detected for M1.
A total of 154 for Africans.
A total of 28 Asians, barring 8 unknown Arabian haplotypes.
And a total of 29 Jews, who were lumped together from the various continents.
The sum of the above totals, amount to 261 "known" M1 lineages.

*With regards to the authors claim about M1 or its ancestor, having “had an Asiatic origin”, the following comes to mind:

The authors of the study at hand, themselves admit that they haven't come across M1 ancestor in either south Asia or southwest Asia. They also take note of its highest diversity in Ethiopia and east Africa. Yet through the shaky premise of their M1c expansion time frame estimations, they build a conclusion around it, by tying it to a dispersal(s) "parallel" to that of U6 - another African marker whose immediate common recent ancestor, namely proto-U6, appears to be elusive thus far.

Well, they wouldn’t be the only ones who have failed to come across any proto-M1 ancestor in southwest and south Asia [Indian Subcontinent mainly]:

Based on the high frequency and diversity of haplogroup M in India and elsewhere in Asia, some authors have suggested (versus [3]) that M may have arisen in Southwest Asia [16,17,31]. Finding M1 or a lineage ancestral to M1 in India, could help to explain the presence of M1 in Africa as a result of a back migration from India. Yet, to date this has not been achieved [15], this study). Therefore, one cannot rule out the still most parsimonious scenario that haplogroup M arose in East Africa [3]. Furthermore, the lack of L3 lineages other than M and N (indeed, L3M and L3N) in India is more consistent with the African launch of haplogroup M. On the other hand, one also observes that: i) M1 is the only variant of haplogroup M found in Africa; ii) M1 has a fairly restricted phylogeography in Africa, barely penetrating into sub-Saharan populations, being found predominantly in association with the Afro-Asiatic linguistic phylum – a finding that appears to be inconsistent with the distribution of sub-clades of haplogroups L3 and L2 that have similar time depths. — Mait Metspalu et al.

So, while they acknowledge the highest "frequencies and diversities" of M1 particularly in Ethiopia, and generally in East Africa [see below for reference], the authors base their claims about ’origins’ on their expansion estimations of M1c derivatives, presumably predominant in northwest Africa rather than east Africa, and its relative sporadic distribution in 'Europe' and 'Southwest' Asia. They attempt to buttress this, by invoking an initial parallel expansion of M1 and U6 "ancestor" lineages into north Africa via the Nile Valley [from "southwest Asia"], then an expansion from northwest Africa this time around, of U6 and M1 derivatives northward into Europe and then eastward into "southwest" Asia via the Nile Valley corridor in the Sinai peninsula, presumably with a few derivatives making their way into sub-Saharan east Africa, where they then underwent some expansion, to give rise to yet another, but later, dispersal from there into "southwest Asia" and hence, accounting for the 'majority' of M1 lineages in "southwest Asia" being east African derivatives than the north African [M1c] counterparts.

In Africa, haplogroup M1 has supra-equatorial distribution (see additional files 1 and 2). As previously reported its highest frequencies and diversities (Table 2) are found in Ethiopia in particular and in East Africa in general. Two appreciable gradients exist. Frequencies significantly diminished from East to West and also going South to sub-Saharan areas. M1 is not uncommon in the Mediterranean basin showing a peak in the Iberian Peninsula. However, it is rare in continental Europe. Although in low frequencies, its presence in the Middle East has been well established from the South of the Arabian Peninsula to Anatolia and from the Levant to Iran. - Gonzalez et al. 2007

*Furthermore,

The authors gather that their observations correlate with that of other researchers, namely Olivieri et al. (2006). To this extent, they put forth that Olivieri et al.’s M1b corresponds to their M1c, the former’s M1a2 corresponds to their M1b, and the former’s M1a1 corresponds to their M1a. They go onto to add that the coalescence ages arrived by the two research group [that of Olivieri et al. and that of the present authors] also correlate. The present authors note that their coalescence time for M1c (25.7 +/- 6.6 ky) overlaps with Olivieri et al.’s coalescence time for M1b (23.4 +/- 5.6). Similarly, they note that their coalescence age for M1a (22.6 +/- 8.1ky) falls within that of Olivieri et al.’s age for M1a1 at 20.6 +/- 3.4ky. However, this makes way for great discrepancy between the said authors and Olivieri et al., whereby their coalescence age for M1b at 13.7 +/- 4.8ky falls quite short of the latter’s age for M1a2 at 24 +/- 5.7ky. Not only are the subgroup nomenclatures distinct, but this latter discrepancy makes an unsubtle difference, so as to no longer render M1c to be older than M1b [in examining from the ongoing juxtapositioning, utilizing Olivieri et al.'s standpoint], but rather, either place M1c (Olivieri et al.'s M1b) at an age a bit younger or on par with the latter, which should be otherwise according to the present study. Though, by their own admission, the present authors favor Olivieri et al.’s methods over their own:


As our calculations are based only on three lineages and that of Olivieri et al on six, we think that their coalescence time estimation should be more accurate than ours. In fact, when time estimation is based on the eight different lineages (AFR-K143 is common to both sets) a coalescence age of 20.6 +/- ky is obtained.

*But if there is any indication about the tenuous nature of the above thesis, without going into other known details about M1, it would be this alternative viewpoint they came up with:

The alternative idea entertained by the authors, is one where M1 could actually be an autochthonous northwest African lineage, which spread northward into Europe and eastward to "Southwest Asia" and east Africa. Again, to be followed by a yet later dispersal from east Africa, likely sub-Saharan east Africa, particularly the Ethiopian populations.

*We've already seen the subjective nature of the present authors' age estimations, naturally attributable to biases underlying sampling procedures to some degree or another, as demonstrated above with the juxtapositioning of the findings of the present authors to those of Olivieri et al. (2006). Furthermore, erratic mutation rates would have undoubtedly affected the age estimation regime applied by the authors, however they may have downplayed the fact, as demonstrated by their observations surrounding the M1a2 subgroup, leading them to omit said subgroup in their lineage coalescence analysis. What makes this interesting, is that both group of authors sought to build their argument around parallel demic diffusion scenarios of U6 and M1, which has little in way of supporting material to stand on, notwithstanding the passionate efforts to push forward with argument; for instance in Olivieri et al.'s case, they say:

The hypothesis of a back-migration from Asia to Africa is also strongly supported by the current phylogeography of the Y chromosome variation, because haplogroup K2 and paragroup R1b*, both belonging to the otherwise Asiatic macrohaplogroup K, have been observed at high frequencies only in Africa (15, 16). However, because of the relatively low molecular resolution of the Y chromosome phylogeny as compared to that of the mtDNA, it was impossible to come to a firm conclusion about the precise timing of this dispersal (15, 16). - Olivieri et al. (2006)

One can almost sense Olivieri et al.'s venting their frustrations from not getting the "desired" results out on the supposed "relatively low molecular resolution of Y DNA", but indeed, as noted here earlier:

Previous genetic research work made very enthusiastic attempts to correlate the likes of U6 and possible "Eurasian"-tagged mtDNA with R1*-M173, supposedly as an attempt to buttress a possible back-migration into Africa; all but failed, with results showing considerable African mtDNA gene pool instead, for populations bearing these chromosomes.

Gonzalez et al. (2008) also fall into that trap; guess where they look towards, to make a connection between an M1 dispersal [supposedly parallel to a U6 one] and a "Middle Eastern" origin part of their argument? Interestingly, it happens to be from the same Dead Sea sample which was implicated in a clear genetic link with sub-Saharan and Eastern African groups. This is the same Dead Sea sample set that shared R1*-M173 with northern Cameroonian sample set, other African groups with these markers. This is also the same Dead Sea sample set with African G6PD-A alleles that were rare to absent in neighboring groups. And Gonzalez et al. (2008) tell us, that this is also the same sample set which is again distinguished from those of neighboring group in its higher "south of the Sahara" mtDNA markers:

Statistical analysis revealed that, whereas the sample from Amman did not significantly differ from their Levantine neighbours, the Dead Sea sample clearly behaved as a genetic outlier in the region. Its outstanding Eurasian haplogroup U3 frequency (39%) and its south-Saharan Africa lineages (19%) are the highest in the Middle East. On the contrary, the lack ((preHV)1) or comparatively low frequency (J and T) of Neolithic lineages is also striking. Although strong drift by geographic isolation could explain the anomalous mtDNA pool of the Dead Sea sample, the fact that its mtDNA lineage composition mirrors, in geographic origin and haplogroup frequencies, its Y-chromosome pool, points to founder effect as the main cause. - Gonzalez et al. (2008)

They acknowledge above that the "anamolous" character of the Dead Sea sample's Y-DNA pool "mirrors" its mtDNA gene pool, which too is replete with markers mostly found in Africa, including the aforementioned rare paraphyletic R1*-M173. However, something interesting happens, with regards to this Dead Sea sample of an "isolated" group:

Ancestral M1 lineages detected in Jordan that have affinities with those recently found in Northwest but not East Africa question the African origin of the M1 haplogroup.

Interesting, because these the same M1c chromosomes being referred to here, and whose specifics have been dealt with. Despite the apparent post-OOA emigration ties between the African groups and the Dead Sea community, reflected in not only both Y-DNA and mtDNA, but also in the X chrosome markers, Gonzalez et al. (2008) still come to the odd conclusion that its presence in the Dead Sea sample set somehow offers some sort of a challenge on the African origin of M1. The notion itself becomes quite comical, when one considers the fact that they just mentioned in the same breath, the presence of these same M1 clusters in Northwestern Africa, which happened to be their alternative hypothetical point of origin [Gonzalez et al. (2007)], as already noted. As they themselves acknowledge, that's where (northwestern Africa) said M1 clusters are widely distributed, and rather rare in the so-called "Middle East", save for this genetically "anamolous" [the authors' own words] and relatively isolated Dead Sea community, notable for its clear "past ties to sub-Saharan and eastern Africa", to put in Flores et al.'s (2005) words, a team that Gonzalez herself was a part of. "Anamolous", because to put it in the authors' own words, the considerably high post-OOA African ties of the Dead Sea sample dataset sets it appart from many other "Middle Eastern" groups, including its neighbours. So, how M1c clusters (which Gonzalez et al. (2007) dub "ancestral" based on their subjective age estimations) — that are very rare even in the "Middle East" (save for the 'anamolous' Dead Sea dataset) in contrast to their wide distribution in northwestern Africa — suddenly puts a question mark on the African origin of M1, is beyond comprehension.

Not only is there lack of apparent parallelism between R1* paragroup distribution and those aforemention markers of U6 and "Eurasian"-tagged mtDNA markers in Africa itself, as the authors (like Olivieri et al. & Gonzalez et al.) seem to be so desperately yearning for, but also the paragroup is essentially absent in all Afrasan speaking groups but those in the Northeast African corner. The marker is even rarer in so-called Southwest Asia than it is in Africa. This naturally contradicts Olivieri et al.'s acknowledgement in the following...

Indeed, M1 and U6 in Africa are mostly restricted to Afro-Asiatic–speaking areas.

Where did this "Afro-Asiatic" phylum originate? Well, look no further than to Gonzalez et al., whom as we've seen, are energetic about this idea of M1 and U6 "parallelism", not unlike Olivieri et al. (2006); they too, clearly in a way that simulteneously soothens or seeks to explain away a bit of disappoint in the course of the study, say:

The anomalous evolution of M1a2 lineages left the coalescence ages of the eastern Africa M1a expansion uncertain, but as suggested for the sister U6a1 radiation; these movements could be correlated in time with an African origin and expansion of Afroasiatic languages.


There you have it, folks, the answer to that simple question. And as if to defy the two groups of aforementioned research teams above, with regards to the proposed M1 and U6 "parallelism" in a demic expansion scenario, a newer study that came along in December 2008, points this out [a finding that appears to have been reproduced in several other studies]:

Our results highlighted a clear genetic differentiation between Berbers from the Maghreb and Egyptian Berbers. The first seems to be more related to European populations as shown by haplogroup H1 and V frequencies, whereas the latter share more affinities with East African and Nile Valley populations as indicated by the high frequency of M1 and the presence of L0a1, L3i, L4∗, and L4b2 lineages. Moreover, haplogroup U6 was not observed in Siwa. Probably, such a maternal diversity between North African Berbers would have been the result of a conjunction of several geographical, prehistoric, and historic factors which guided contacts (and thus exchanges) between local populations and migrating groups. First, in addition to the geographical distance, which certainly increases the genetic distance, the geographical location of Berber populations is very peculiar: the Berbers from the Maghreb are at the end of a long migration route, whereas Berbers from Siwa are rather in a crossroads between the Middle East, East Africa, sub-Saharan areas and the North African corridor. Therefore, meetings and exchanges between local and migrating populations were not identical in North West and North East Africa. - C. Coudray et al., The Complex and Diversified Mitochondrial Gene Pool of Berber Populations

We are told above, that M1 is substantial in the Siwa group, but no U6 was observed! Furthermore, it would make sense for the Siwa group to be a pristine representative of the aforementioned U6/M1 "parallelism" scenario, given that they are even closer to the so-called "Near East" than the northwestern African "Berbers", would it not? Perhaps it wouldn't be as funny, if Anna Olivieri herself was not a participant of this Coudray et al. study! Speaking of the so-called "Near East", the following claim is interesting, when one takes into account that this area was singled out as one of Gonzalez et al.'s proposed areas of M1 origin, because while looking at it from the alternative proposed origin, presumably the northwestern African one, we are told in a passing that...

That M1 is an autochthonous North African clade that had its earliest spread in northwestern areas marginally reaching the Near East and beyond. This would explain the shortage of basic M1 lineages in the Near East but would leave the Asiatic origin of the M1 ancestor undetermined.

...interesting.

*Another thing that hasn't been relayed through the present study, are details that follow:

The coding regions transitions are likely to change relatively slower than those of hypervariable segments, and hence, likely to remain intact within a clade. To assist in determining which clade to place a monophyletic unit, key coding region transitions have to be identified. In the case of M1, we were told:

We found 489C (Table 3) in all Indian and eastern-African haplogroup M mtDNAs analysed, but not in the non-M haplogroup controls, including 20 Africans representing all African main lineages (6 L1, 4 L2, 10 L3) and 11 Asians.


These findings, and the lack of positive evidence (given the RFLP status) that the 10400 C->T transition defining M has happened more than once, suggest that it has a single common origin, but do not resolve its geographic origin. Analysis of position 10873 (the MnlI RFLP) revealed that all the M molecules (eastern African, Asian and those sporadically found in our population surveys) were 10873C (Table 3). As for the non-M mtDNAs, the ancient L1 and the L2 African-specific lineages5, as well as most L3 African mtDNAs, also carry 10873C.

Conversely, all non-M mtDNAs of non-African origin analysed so far carry 10873T. These data indicate that the **transition 10400 C-->T, which defines haplogroup M**, arose on an African background characterized by the ancestral state 10873C, which is also present in four primate (common and pygmy chimps, gorilla and orangutan) mtDNA sequences.Semino et al.

...which is significant, as other M lineages are devoid of M1 coding region motifs, not to mention the M1 HVS-I package. The above does demonstrate, how M lineages likely arose on an African 'background' by single-event substitutions in the designated African ancestral counterparts. The ancestral transition of 10873C is substituted by 10873T in non-African non-M haplogroups, while the 10400C transition was substituted in M lineages by 10400T; that ancestral state of 10873C remains at large in the M macrohaplogroup, unlike the so-called non-African & non-M haplogroup counterparts.

Furthermore,...

The 489C transition, as noted above and can be seen from the diagram, is peculiar to the M macrohaplogroup, again suggestive of unique event mutations characterizing the family:

The phylogenetic location of the mutations at nt 489 and 10,873 (arrow) was predicted by our analysis. The seemingly shared mutation at nt 16,129 (by G, Z and M1) is very likely an accidental parallelism. The ancestral states 10400C, 10810C and 10873C are fixed in L1 (as analysed so far) and are present in the ape sequences.


The 16129 sharing across the M1 haplogroups, seems to be one of those instances of random parallel mutation, recalling Chang Sun et al.'s observations of random parallel mutations of certain transitions across the M macrohaplogroup.

We also know that "southwest Asian" and "European" M1 lineages are derivatives of African counterparts, and the same is true for southwest Asian non-M1 affiliated M lineages from south Asia:

Compared to India, haplogroup M frequency in Iran is marginally low (5.3%) and there are no distinguished Iranian-specific sub-clades of haplogroup M. All Iranian haplogroup M lineages can be seen as derived from other regional variants of the haplogroup: eleven show affiliation to haplogroup M lineages found in India, twelve in East and Central Asia (D, G, and M8 ) and one in northeast Africa (M1)…

Indian-specific (R5 and Indian-specific M and U2 variants) and East Asian-specific (A, B and East Asian-specific M subgroups) mtDNAs, both, make up less than 4% of the Iranian mtDNA pool. We used Turkey (88.8 ± 4.0%) as the third parental population for evaluating the relative proportions of admixture from India (2.2 ± 1.7%) and China (9.1 ± 4.1%) into Iran. Therefore we can conclude that historic gene flow from India to Iran has been very limited.
Mait Metspalu et al.

With that said, Semino et al.'s older study still remains strong, the way I see it:

haplogroup M originated in eastern Africa approximately 60,000 years ago and was carried toward Asia. This agrees with the proposed date of an out-of-Africa expansion approximately 65,000 years ago10. After its arrival in Asia, the haplogroup M founder group went through a demographic and geographic expansion. The remaining M haplogroup in eastern Africa did not spread, but remained localized up to approximately 10,000-20,000 years ago, after which it started to expand.Semino et al.

Elsewhere, I've also talked about some 'basal' M-like lineages in Africa; for instance, at least one of such was identified in the Senegalese sample.

Am. J. Hum. Genet., 66:1362-1383, 2000

mtDNA Variation in the South African Kung and Khwe and Their Genetic Relationships to Other African Populations


"The Asian mtDNA phylogeny is subdivided into two macrohaplogroups, one of which is M. M is delineated by a DdeI site at np 10394 and an AluI site of np 10397. The only African mtDNA found to have both of these sites is the Senegalese haplotype AF24. This haplotype branches off African subhaplogroup L3a (figs.2 and3), suggesting that haplogroup M mtDNAs might have been derived from this African mtDNA lineage..."

The relevant representation in this recap diagram:
 -
Image source: Link

In the image above, the 10397 transition is shown in the L3-M linkage, while 10394, which should show up as positive (as exemplified in the above extract) in the M macrohaplogroup, shows up negative in the linkage between L3 and non-M affiliated lineages.

What does all this talk of specific transitions or nucleotide sequences tell us?

Well, to put the above compilation into perspective, and keep it simple, the point is this:

Semino
et al.'s demonstration of certain characteristic basic coding transitions of the M super-haplogroup [not including the key coding region motifs unique to the M1 family], springing directly from African ancestral motifs don't require that M1 has to have a proto "non-African" M1, because all the necessary basic nucleotide sequences have been identified in the autochthonous African gene pool [findings which have been buttressed by later studies, of basic motifs in rare "M-characteristic" basic (African L3) clades], enough to explain the proposed African origin for the M lineage in general, including M1; whereas an Asian origin of M1 would necessitate an Asian "proto-M1" lineage that would explain the relatively young expansion ages of M1 and lack of descendancy from pre-existing Asian M lineages. This hasn't been achieved either by the present study or ones prior to it.

Getting to the gist:

Basal M mtDNA ~ between c. 60 - 80 ky ago

And then, M1 ~ between ~ c. 10 - 30 ky ago

The studies the present author posted, suggest that the basal motifs characteristic of the M macrohaplogroup arose in Africa, anywhere between 60 - 80 ky ago [since they would have likely been in the continent by the time of the 60 ky ago or so OOA migrations] . Sometime between 60 ky and 50 ky ago [some sources place it between 75 - 60 ky ago], these L3 offshoots were carried outside of Africa, amongst early successful a.m.h migrations, which resulted in the populations now living in the Indian-subcontinent, Melanesia and Australia who have these lineages. Not all the basal African L3M lineages, as Semino et al. convincingly put it, left the continent, as indicated by the basal L3a-M motif detected in Senegal, M1 diversity in Africa, particularly East Africa, possibly the dectection of M1 and other M lineages in tandem within a Tanzanian sample (Gonder et al. 2006), and the apparent lack of descendancy of M1 from older-coalescent Asian macrohaplogroup. Rather, it appears that the basal L3M lineages which remained in Africa, underwent a relatively limited demographic intra-African expansion until relatively recently, i.e. between 10 - 30 ky ago, compared to the Asian L3M derivatives, which underwent major expansions, naturally within the quantitatively smaller founder immigrant groups, i.e. the founder effect.

M1 is likely the culmination of relatively more recent demographic expansions of basal L3M lineages in the African continent, with M1 derivative being a successful candidate, in what could have possibly involved other derivatives which might not have expanded to the same level intra-continentally, and subsequently, extra-continentally as well.

M1 has strongly been correlated with the upper Paleolithic expansion of proto-Afrasan groups across the Sahara to coastal north Africa, and further eastward via the Sinai peninsula.

15 comments:

Anonymous said...

The authors of the study at hand, themselves admit that they haven't come across M1 ancestor in either south Asia or southwest Asia. They also take note of its highest diversity in Ethiopia and east Africa. Yet through the shaky premise of their M1c expansion time frame estimations, they build a conclusion around it, by tying it to a dispersal(s) "parallel" to that of U6 - another African marker whose immediate common recent ancestor, namely proto-U6, appears to be elusive thus far.


Thanks for exposing the rickety structures on which the claims of Gonzalez, et. al. are built. They find the greatest diversity and frequencies in East Africa, yet argue for a Northwest Africa or 'Asiatic' 'origin' based on the slimest of threads. Your alternative scenario, backed with Semmino et al's data is much more solid, and consistent across the board. In a field rife with 'spin' it is refreshing to see someone bringing hard-nosed clarity to the field.

Mystery Solver said...

For further simplification for the laymen, who still struggle following the main post:

Anyone with average intelligence would notice that M1 piece talks about *in detail, inconsistencies and problems with Gonzalez et al.'s coalescence age estimations; not only is its accuracy hampered by insufficiency of Intra-M1 phylogenetic representation, sampling methods [including mutation rate estimation choices], but also irregularities in their intra-M1 sample molecular clock pattern. As noted, the authors' forced omission of their M1a2 markers from their estimations for instance, is bound to contribute to the throwing off of their age estimations. Things like when a lineage expanded since the time of emergence of TMRCA can affect the molecular pattern of substitution rates as rendered by say, the extent to which purifying selection has been achieved; a lineage could therefore remain localized in a population with a small effective population size, and then for some reason, undergo major demographic expansion, which could well give the impression that the lineage was only around for a shorter amount of time than what its actual TMRCA time of emergence would have otherwise been -- again, we see this potential exemplified in Gonzalez et al.'s M1a2 collection.

The "Near Eastern" section of the Great Rift Valley 1) has no upstream markers of either M1, L3, or even that of the Asian M macrohaplogroup; rather, this region has only downstream derivatives of these markers. 2) The ancestor of M1 has not been located in either south Asia [which is essentially home to the Asian M family] or the "Near East". 3) However, the lineages necessary to give rise to M1 are all present in Africa, as exemplified by the 10873C [emphasis added] marker [Semino et al.] at the RFLP position 10873, which transcends M macrohaplogroup, having been identified amongst L1, L2 and L3 clades; on the other hand, all non-African non-M clades, which are all essentially ultimately derivatives of the African L3 superclade, have 10873T [with emphasis]. This suggests that African M1 and the Asian M macro-haplogroup derive from an earlier bifurcation event that took place in Africa, which transcended L3. Furthermore, "middle-of-the-road" evolutionary clade of L3a, descendant of L3 but older than other M-designated L3 lineages and found in a Senegalese sample, adds to this theme of Africa having all the necessary markers to give rise to M1. That the authors propose two origin theories, is further testament to the fragile nature of their theory.

Hope this helps,

Mystery Solver.

Maju said...

The direct ancestor of M1 is M (again see PhyloTree) and not L3 (which is the ancestor of M.

M has something like 40 basal sublineages (and growing), being the largest star-like structure in all the human mtDNA tree by far (only H is close), surely a signature of the earliest expansion of our species in Asia.

All but M1 and a new tiny haplogroup restricted to Madagascar are Asian or Oceanian. More than 50% of these basal sublineages are from South Asia, with the next most highly diverse area being East Asia.

There is indeed a gap between Asian M and its descendant M1 (if the African coalescence of this haplogroup can be confirmed, what I do not have too clear at this point) but that is not that rare (there are other haplogroups, often with rather long stems, as is the case of M1, 4 coding region mutations and 5 control region ones downstream of the M node). In these cases the logical interpretation is that the proto-M1 lineage traveled in small numbers from the upstream node place of origin (South Asia with all likelihood for M) to the place of expansion (as M1), be it Africa or West Asia.

The inverse is also true for the two main Eurasian clades, M and N, which must have somehow migrated in small numbers from their upstream node's original area (L3, East Africa) to their respective expansion places (South Asia for M and, IMO, SE Asia for N). Their stems are not as long as that of M1 anyhow but they are not single mutation stems either.

Mystery Solver said...

Maju writes:

M has something like 40 basal sublineages (and growing), being the largest star-like structure in all the human mtDNA tree by far (only H is close), surely a signature of the earliest expansion of our species in Asia.

Of course, M has basal motifs; it is noted in the body of the blog post.



All but M1 and a new tiny haplogroup restricted to Madagascar are Asian or Oceanian. More than 50% of these basal sublineages are from South Asia, with the next most highly diverse area being East Asia.


It is fairly established that hg M has reached the climax of its diversity in southern Asia or vicinity thereof; that is a non-issue. This however says nothing of the origin of the clade. Asia lacks the ancestral hg L3 clades necessary to have given rise to M, nor does the Asia regions that have the most diverse genepool of hg M have M1 clade. This too is noted in the blog post.



There is indeed a gap between Asian M and its descendant M1 (if the African coalescence of this haplogroup can be confirmed, what I do not have too clear at this point) but that is not that rare (there are other haplogroups, often with rather long stems, as is the case of M1, 4 coding region mutations and 5 control region ones downstream of the M node).


Rare or not, the significance of the "gap" underscores the distinct evolutionary histories respective to the African hg M1 and the Asian sub-clades of hg M. The distinct histories only make sense, if Africa is the homeland for hg M's ancestral clade.


In these cases the logical interpretation is that the proto-M1 lineage traveled in small numbers from the upstream node place of origin (South Asia with all likelihood for M) to the place of expansion (as M1), be it Africa or West Asia.


The interpretation cannot be logical, as there is no such thing as proto-M1 or even hg M1 in southern Asia. Again, pointed out in the blog post. There is not even any such thing as proto-M in southern Asia.


The inverse is also true for the two main Eurasian clades, M and N, which must have somehow migrated in small numbers from their upstream node's original area (L3, East Africa) to their respective expansion places (South Asia for M and, IMO, SE Asia for N). Their stems are not as long as that of M1 anyhow but they are not single mutation stems either.


I fail to see how this is relevant. Nobody is speaking of single mutations, but rather key mutations.

Maju said...

The place of greatest basal diversity is where a clade most likely coalesced. Otherwise, we'd see highly derived clades, even if diverse. What we see in South Asia is highly diverse basal clades, often just one mutation removed from the M node.

Besides I forgot to mention that M1 has a sister, M51, which is only found in Asia as far as I can tell. M1 is not anymore even a basal M subclade, M1'51 is.

"if Africa is the homeland for hg M's ancestral clade".

Africa is of course the homeland for M's ancestor: L3.

But that's not too relevant for the particular history of M1:

1. L3 in Africa, probably at the Ethiopia-Sudan border area
2. M in Asia (surely South Asia)
3. M1'51 in Asia without doubt as M51 is not found in Africa AFAIK, while M1 is found in Asia
4. M1 in either Asia or Africa (as I said, I have yet to see anything that persuades me that M1 coalesced in Africa instead of West Asia)

"there is no such thing as proto-M1"

Not too important because one has to look to M as whole, first of all, for which M1 has a weight of 1/40 roughly, as M has so many basal subclades. The same that one has to look to, say, L3, as a whole and not just to one single arbitrary subclade such as M (in the case of L3, M and N and the other basal subclades weight 1/7 each).

Anyhow now we know of a proto-M1: M1'51.

Mystery Solver said...

Maju writes:

The place of greatest basal diversity is where a clade most likely coalesced. Otherwise, we'd see highly derived clades, even if diverse. What we see in South Asia is highly diverse basal clades, often just one mutation removed from the M node.

Asia does not have any ancestral standalone clade for the entire hg M sub-clades. For this to be the case, it would have to be like Africa, where such a clade has been identified, as a L3 derivative but an hg M ancestor.

Besides I forgot to mention that M1 has a sister, M51, which is only found in Asia as far as I can tell. M1 is not anymore even a basal M subclade, M1'51 is.

My questions to you above on this point stands. It will be interesting how you rise to this daunting challenge.

Africa is of course the homeland for M's ancestor: L3.

But that's not too relevant for the particular history of M1:


It's might careless of you to say that the African L3 ancestor of M is not "too relevant". I am not sure you understand that I am not generically referring to the "group" L3, but rather, a specific downstream L3 clade that has the most basic M characteristic motifs, but lacks downstream M motifs. To dismiss this, you have to not be mindful of the significance of the fact that hg M is distinct from hg N, though both share L3 ancestry. Do you now understand what I'm saying here?

"there is no such thing as proto-M1"

Not too important


How can you dismiss the fact that proto-M1 does not exist in Asia as not "too important", yet trying to force an Asian origin on M1? That makes no sense.

because one has to look to M as whole

You cannot do that, since M1 does not derive from any Asian-specific M clade or vice versa. It is a standalone M clade.


, first of all, for which M1 has a weight of 1/40 roughly, as M has so many basal subclades.


Yet, Asia does not have any ancestral standalone M clade, that is ancestral to all M sub-clades. Such a clade, has however, already been identified in Africa. Do you still not get the significance of this?

Furthermore, as I have detailed in the body of this blog post, hg M's basal motifs are such, that Africa requires no other clade but L3, which is noticeable absent in much of Asia, short of relatively recent gene flow from nearby African territories. Despite this, an L3 downstream clade, but which is upstream to hg M, has been identified. It doesn't get any better than that.


Anyhow now we know of a proto-M1: M1'51.


I take it that by "proto-M1", you are referring to an ancestral glad that has the basic motifs of M1, but LACKS downstream motifs of M1 sub-clades, right? And that this ancestral M1 clade shares basic motifs with other non-M1 clade, but NOT the characteristic M1 motifs, correct? If so, I'd like to see the nucleotide specifics of how you came around to that. In any event, this clade would still have to be the downstream clade of the African L3 prototype clade of M1, which is not any ordinary L3 clade, as you might assume. My advise is that you carefully go over what I say in the blog post, because it will definitely alleviate much of the confusion you seem to be having on this subject.

Maju said...

"If so, I'd like to see the nucleotide specifics of how you came around to that".

Look at PhyloTree, please.

"In any event, this clade would still have to be the downstream clade of the African L3 prototype clade of M1"...

It is a downstream clade of Asian M, which is a downstream clade of African L3. You cannot deny the existence of M and then claim L3 as real, there's no L3 underived either. Not anymore: it evolved, as everything does.

Mystery Solver said...

Maju writes:

Look at PhyloTree, please.

Trying to sneak out through the easy way out. Not on my watch. You claimed that there is an "Asian" proto-M1. You should be able to spell out the specific nucleotide peculiarities of this standalone ancetral clade that only has a direct link with the M1 clade but not the other clades within the M haplogroup, and AT THE SAME TIME, lacks downstream mutations of all known M1 sub-clades. The fact of the matter is, you don't have such an information, and you know it.

You cannot deny the existence of M and then claim L3 as real

My denial of *haplogroup* M is another of your figment, like a long string of imaginations you make about me. I said you have no evidence of "proto-M1" or the ancestral clade of "M1", as a standalone existing clade on its own. It is not my fault you can't tell the difference between a would-be proto-M1 or ancetral M1 clade and a "haplogroup", which in this case, is haplogroup M.

"there's no L3 underived either."

Salas et al. for example, beg to differ, when they note:


"Both L3f (Figure 8a) and L3g (Figure 8b) are rare and also appear to have an East African origin. L3f* and L3g are virtually restricted to East Africa (with some dispersal into Central Africa, southeastern Africa, and the Near East)." - Salas et al.

The L3f* marker(s) in question here, are actual physical chromosomes that lack the established downstream mutations of all the established L3f sub-clades. They are not theoretical ancestral chromosomes, extrapolated from common basic mutations of the entire haplo*group* of L3f. Such a clade is referred to in genetic terms, as "underived"...yes, even though that said clade in turn derives from another ancestral clade, but of a different lineage.

Man, you need to take up the basics before you ever attempt to take me on.

Maju said...

Salas is not saying that L3* is underived, what he means is that the abundance of L3* (L3-other) probably hides a wealth of L3 subclades, suggestive of greatest basal diversity and likely origin.

In fact some of that diversity has already been unraveled, AFAIK, as the haplogroup is now known is quite greater detail.

"The L3f* marker(s) in question here, are actual physical chromosomes"...

They are not chromosomes. We are talking mitochondrial DNA here. The markers are mutations (also called transitions) in the DNA of an organelle, a bacteria-like symbiont, present in all human cells and transmitted only via the mothers. This is also pretty basic.

But the markers of L3f* do not exist, that's why the asterisk. L3f is defined only here, and L3f* is L3f that is not described as any specific subclade in that paper. This is again basic.

I do not mind to give you some lessons for free but first you need to acknowledge your knowledge limitations at the moment. My email is at my profile (remove the anti-spam "DELETETHIS"), feel free if you think that is better than discussing all this in public. No hard feelings on my side. You already have a quite decent grasp of population genetics, you have obviously read a bunch of papers and you clearly have the interest, but you seem to lack some very important fundamentals. No offense meant: I'm just being sincere and I must say I appreciate your interest in genetics, in special the much underappreciated African genetics. But you can do better easily as soon as you get all these basics that are getting you confused. Feel free not to publish this if you prefer.

Mystery Solver said...

Maju writes:

Salas is not saying that L3* is underived, what he means is that the abundance of L3* (L3-other) probably hides a wealth of L3 subclades, suggestive of greatest basal diversity and likely origin.

Be advised to take up reading 101. In the context applied in the Salas et al. piece, the L3f* [you can't even get the clade right] is representative of the most phylogenetically-basic clade of the L3f family, since by way of the loci sequenced, it lacks the established downstream substitutions of the established sub-clades of the L3f family. In genetic jargon, which only evades laymen rookies, this is given the descriptive of "underived". Take up basic lessons in genetic jargon via acquaintance with, for example, literature on NRYH Tree, literature by the likes of Hammer, Salas, Karafet, et al.


We are talking mitochondrial DNA here.

I stand corrected. I meant to say that that the example provided is that of an actual clade; not a theoretical ancestor that has been extrapolated. It doesn't take a genius to figure out that the mistake was a simple "blooper". It happens to human beings.

But the markers of L3f* do not exist, that's why the asterisk. L3f is defined only here, and L3f* is L3f that is not described as any specific subclade in that paper. This is again basic.

...and it's a pitty you cannot get "basic" stuff right. Let's watch you mangle this up as well...

"Two mutations (M122 and P198) now define the large O3 clade, which is subsequently divided into a major subclade (O3a) that is defined by five mutations (M324, P93, P197, P199, and P200) and an underived lineage (O3*), which is found at low frequencies in China, Taiwan, and Indonesia." - Karafet et al. 2008

Let me guess, you'll say the "underived" lineage does "not exist" and that the "*" is a reflection of this "non-existence". LOL

I do not mind to give you some lessons for free but first you need to acknowledge your knowledge limitations at the moment.

You did seem to have trouble "giving me basic" lessons, when you were pressed to produce empirical data and name a generic/ordinary L3 clade that has these mutations: "DdeI site at np 10394 and an AluI site of np 10397". Why is this pretty basic question causing too much trouble for your faculty? I'd like to know.

Maju said...

"Let me guess, you'll say the "underived" lineage does "not exist" and that the "*" is a reflection of this "non-existence". LOL".

Effectively. Karafet is using sloppy language in that sentence. I understand that she usually replies to emails by interested amateurs, so you may try writing to her and asking for a clarification.

"you can't even get the clade right"

I get the impression in that paper that Salas is speaking of abundant L3-other in general in the case of Sudan/East Africa. He uses terminology that is in some cases obsolete (Hadza L3g is now called L4g) or not orthodox (his L3A means L3(xM,N), something I have seen nowhere eles). The relative age of this paper (2002) also poses some limitations because some finer distinctions have been made since then in African mtDNA knowledge. It is still a valuable reference but old anyhow.

"You did seem to have trouble "giving me basic" lessons, when you were pressed to produce empirical data and name a generic/ordinary L3 clade that has these mutations: "DdeI site at np 10394 and an AluI site of np 10397"".

I just ignored that question because I thought there were more pressing issues in the conversation and because, I understand, you were asking for impossibles such as "underived lineages".

I have been searching for those sites at PhyloTree and they do not seem to have any relation with any lineage in either L(xM,N) or M. So I can just wonder why you ask about them.

I understand that you can feel offended because I have been forced to challenge your status as know-it-all of African genetics but I had to do it. I do not wish to waste more energies fighting arrogance and prejudice (I can understand arrogance, I can even sympathize with it at times, because I can be quite arrogant myself... but sometimes one has also to step down from any pedestals and face reality with some humility if one really wants to be proud, which is better than mere arrogance, and grow in wisdom).

It seems I got you wrong: I thought you were more knowledgeable than you actually are. I can only hope that you correct your defects and limitations on your own. Best luck.

Mystery Solver said...

Maju writes;

Effectively. Karafet is using sloppy language in that sentence.

Using your unimportant standards, the entire genetics research community is sloppy, and a nobody amateur like you got it right. What a joke. Karafet is simply using jargon that is well understood in genetic research circles.


I understand that she usually replies to emails by interested amateurs, so you may try writing to her and asking for a clarification.

Like everything else you say, you are astonishingly misguided. It is you who is puzzled by the common genetic jargon she's using. Why don't you write her.


I get the impression in that paper that Salas is speaking of abundant L3-other in general in the case of Sudan/East Africa.


This is the quote in question that you couldn't resist mangling up:

"Both L3f (Figure 8a) and L3g (Figure 8b) are rare and also appear to have an East African origin. L3f* and L3g are virtually restricted to East Africa (with some dispersal into Central Africa, southeastern Africa, and the Near East)." - Salas et al. 2002

Noticed that in the preceding line L3f is not accompanied by "*", and yet in the following line, it is accompanied by one? Judging by your wacky translation efforts, I don't think you noticed.

I just ignored that question because I thought there were more pressing issues

More pressing issue about the M1 clade than its characteristic nucleotide specifics? Are you for real? No, you did not ignore it; it was a cop-out.

and because, I understand, you were asking for impossibles such as "underived lineages".

Since when does requesting you to cite an ordinary L3 clade that has *both* the restriction and insertion nucleotide positions that characterizes ONLY the M haplogroup translate into "underived lineages"? This is just another of your wacky translation efforts that you use to cop-out of difficult situations.

I have been searching for those sites at PhyloTree and they do not seem to have any relation with any lineage in either L(xM,N) or M.

Let me get this straight:

You think that the M haplogroup restriction [@ np 10394] and insertion [@ np 10397] sites noted in journals is made up, simply because you are too challenged to have known about them and that you cannot locate it "PhyloTRee"? You are tripping.

I understand that you can feel offended because I have been forced to challenge your status as know-it-all of African genetics but I had to do it.

Now you've gone and convinced yourself that the astonishing ignorance that you display in fundamental matters in molecular genetics is supposed to offend someone else other than yourself. Where does your madness end?

What you've done here is nothing short of exposing yourself to unwarranted humiliation. You could have simply asked for guidance, instead of making a total fool of yourself, saying things that are comical at best. Confusing a "basal node" with "haplogroup" is a good example of this. Far from offending, my friend, you provide entertainment: we get to laugh at you.

Mystery Solver said...

Maju writes:

I do not wish to waste more energies fighting arrogance and prejudice (I can understand arrogance, I can even sympathize with it at times, because I can be quite arrogant myself...

Yeah, that is it: when you feel defeated, you try to sugarcoat it with made up excuses of what's wrong with the other person. Nothing new -- it happens all the time in debates, games, competition, etc.

I agree with the self-acknowledgment about your arrogance.

but sometimes one has also to step down from any pedestals and face reality with some humility if one really wants to be proud, which is better than mere arrogance, and grow in wisdom).

I agree; it would be wise of you to actually practice this advise of yours. Talk is cheap.

It seems I got you wrong: I thought you were more knowledgeable than you actually are. I can only hope that you correct your defects and limitations on your own. Best luck.

You must be misguided again, if for a minute, you think that a worthless opinionated dogma of nobody amateur like yourself, about my person, matters one bit to me. It does not. Your opinion is just about as correct and important as your lack of ability to distinguish a haplogroup from a basic node. You bet, will correct my "defects" and "limitations" alright, which has nothing to do with genetics: it starts with rectifying your ability to post any more comment here. Thanks for the advice.

imsnow said...

My mother just tested for the M1 haplogroup through a mtDNA test at FamilyTreeDNA.com. Her family has lived in Thailand for as long as she can remember.

Maju said...

@Imsnow: double check to make sure it's not a close relative of M1. M1, we know now, is branch of M1'20'51 (see: phylotree.org) and both M20 and M51 are SE Asian. I can't say for sure but it's possible that your DNA testing company has detected a marker that is shared with the siste haplogroups, alternatively it's possible that a branch of genuine M1 remained in SE Asia but it's rare (so undocumented so far AFAIK) or, of course, that it corresponds to some minor population movement across the historical Indian Ocean trade routes and that therefore comes from Africa or Arabia in your mother's case.