Publications

Statistical Techniques for Defining Reference Sets of Accessions and Microsatellite Markers

Odong, T.L.; van Heerwaarden, J.; Jansen, J.; van Hintum, T.J.L.; van Eeuwijk, F.A.

Summary

Exploitation of the available genetic resources around the world requires information about the relationships and genetic diversity present among genebank collections. These relations can be established by defining for each crop a small but informative set of accessions, together with a small set of reliable molecular markers, that can be used as reference material. In this study, various strategies to arrive at small but informative reference sets are discussed. For selection of accessions, we proposed genetic distance optimization (GDOpt) method, which selects a subset of accessions that optimally represent the accessions not included in the core collection. The performance of GDOpt was compared with Core Hunter, an advanced stochastic local search algorithm for selecting core subsets. For the selection of molecular markers, we evaluated (i) the backward elimination (BE) method and (ii) methods based on principal component analysis (PCA). We examined the performance of the proposed methodologies using five real datasets. Relative to average distance between an accession and the nearest selected accession (representativeness), GDOpt outperformed Core Hunter. However, Core Hunter outperformed GDOpt with respect to allelic richness. The BE performed much better than other methods in selecting subsets of markers. Methods based on PCA showed that, for practical purposes, the inclusion of the first few (two or three) principal components (PCs) was often sufficient. To obtain robust and high-quality reference sets of accessions and markers we advise a combination of GDOpt (for accessions) and BE or methods based on PCA using a few PCs (for subsets of markers).