Linkage Analysis in Plant Breeding: A Practical Guide to Genetic Map Construction
Published: June 2026 | Updated: 2026 | AgroSynapsis Articles
Learn how linkage analysis transforms molecular marker data into genetic maps used for QTL mapping, marker-assisted selection, and gene discovery in plant breeding.
Contents
- Why Data Curation Is Critical
- The Main Steps of Linkage Analysis
- Mapping Populations Used in Linkage Analysis
- Linkage Map Construction Is Iterative
- Comparing Genetic and Physical Maps
- Where Genetic Maps Are Useful
- Frequently Asked Questions
- QTL Mapping Workshop
Linkage analysis is the process of using molecular marker data from a mapping population to construct a genetic map. In plant breeding, genetic maps are essential for understanding how markers are inherited, identifying genomic regions associated with important traits, and providing the foundation for QTL mapping, marker-assisted selection, and gene discovery.
In simple terms, linkage analysis identifies molecular markers that are inherited together more frequently than expected by chance because of their physical proximity on the same chromosome. These markers are grouped into linkage groups, ordered along chromosomes, and assigned genetic distances measured in centiMorgans (cM).
Although software packages such as QTL IciMapping, JoinMap, OneMap, ASMap, and R/qtl can automate much of the process, constructing a reliable genetic map requires careful data curation, parameter optimization, and biological interpretation. In this article, we explore the complete linkage analysis workflow, from marker quality control to map validation and practical applications in plant breeding.
Why Data Curation Is the Most Critical Step
Before constructing a genetic map, the marker dataset must be carefully curated. This is especially important when thousands of SNPs or molecular markers are included. Even a small proportion of genotyping errors, missing data, or wrongly coded markers can distort the recombination estimates and inflate genetic distances.
The reason is simple: linkage mapping algorithms interpret changes in marker genotype along the chromosome as recombination events. If a genotype is wrong, the software may interpret this error as a real recombinant. As a result, the map becomes artificially longer, marker order becomes less stable, and downstream QTL mapping may lose accuracy.
1️⃣ Missing Data
Markers or individuals with excessive missing data should be removed or carefully evaluated. Missing values reduce the information available for estimating recombination frequencies and marker order. In high-density marker datasets, some level of missing data is expected, but excessive missingness can make the map unstable.
Depending on the dataset, imputation can be considered. Some R tools and mapping software provide options to infer missing genotypes based on neighboring markers and the expected inheritance pattern. However, imputation should be used carefully, because poor-quality imputation can introduce artificial patterns into the map.
2️⃣ Segregation Distortion
Each marker should be tested against the expected Mendelian segregation ratio for the population type. For example, an F2 population, a backcross, recombinant inbred lines, doubled haploids, or an outcrossing population each has its own expected segregation pattern.
Markers showing strong segregation distortion should not be automatically discarded, because segregation distortion may reflect real biological processes such as selection, gametic competition, zygotic viability, or structural genome differences. However, distorted markers should be evaluated carefully because they may affect linkage group formation, marker ordering, and map length.
3️⃣ Genotyping Errors
Genotyping errors are among the most damaging problems in linkage map construction. A single incorrect genotype can create an apparent double recombination event over a very short genetic interval. Since true double recombination events are rare between closely linked markers, these suspicious patterns can be used to detect potential errors.
Software such as QTL IciMapping can help identify suspicious genotypes and allows users to correct or remove problematic data points. This step is crucial because genotyping errors inflate genetic distances and may cause incorrect marker order. In practice, error checking is one of the most important steps for producing a reliable genetic map.
4️⃣ Wrong Phasing and Allele Coding
Another important source of error is incorrect phasing or wrong genotype coding according to the parental origin of alleles. In a biparental population, markers must be coded consistently according to which allele comes from each parent. If parental alleles are reversed or incorrectly assigned, the software may interpret the marker as showing a different inheritance pattern.
This is particularly challenging in large datasets containing thousands of markers. Wrong phasing can produce artificial recombination patterns, split markers into incorrect linkage groups, or increase map length. Therefore, parental genotypes, marker coding, and allele origin should be checked before the final analysis.
The four major stages of linkage analysis are summarized in Figure 1. Although software automates much of the process, each step requires careful biological interpretation and quality control to produce a reliable genetic map.

The Main Steps of Linkage Analysis
Once the marker dataset has been curated, the software typically follows a series of analytical steps. These steps may differ slightly among software platforms, but the logic is generally the same.
1️⃣ Testing Marker Segregation
The first step is to test whether each marker follows the expected segregation ratio. This helps identify markers that behave abnormally and may require further inspection. Some distorted markers may still be useful, but strongly distorted or suspicious markers should be treated with caution.
2️⃣ Grouping Markers into Linkage Groups
Markers are then assigned to linkage groups. Ideally, the number of linkage groups should correspond to the haploid chromosome number of the species. For example, a species with 12 chromosomes should ideally produce 12 linkage groups, although gaps, low marker density, or problematic markers may create additional small groups.
Different software tools may group markers using slightly different criteria, such as recombination frequency thresholds, LOD score thresholds, distance-based methods, or general grouping parameters. Because of this, small differences among map versions are common. Adjusting the grouping threshold is often necessary to obtain biologically meaningful linkage groups.
3️⃣ Ordering Markers Within Each Linkage Group
After grouping, markers are ordered linearly within each linkage group. Marker ordering is one of the most computationally difficult steps, especially when thousands of markers are used. Closely linked markers may have very similar segregation patterns, making their exact order difficult to determine.
Different algorithms may produce slightly different marker orders. Therefore, marker order should be evaluated using statistical criteria, recombination patterns, and biological expectations. Suspicious local expansions, unexpected double recombination events, or unstable marker positions may indicate errors that need to be corrected.
4️⃣ Calculating Genetic Distances
Once markers are ordered, recombination frequencies are converted into genetic distances. These distances are expressed in centiMorgans. Mapping functions such as Haldane or Kosambi are commonly used for this conversion.
The Haldane function assumes no crossover interference, whereas the Kosambi function accounts for crossover interference. The choice of mapping function can influence the final length of the map, although the overall structure is usually similar when the data are clean and the marker order is reliable.
If you are unfamiliar with how recombination frequencies arise and why they are used to estimate marker distances, see our article on recombination in plant breeding .
Mapping Populations Used in Linkage Analysis
Linkage analysis can be performed using different mapping populations, including F2 populations, backcrosses, recombinant inbred lines (RILs), doubled haploids (DH), and full-sib families. Each population type differs in segregation ratios, recombination structure, marker coding, and mapping resolution.
Because the choice of mapping population affects the entire analysis, it should be considered before genotyping and map construction.
👉 Learn more in our article on mapping populations used in QTL mapping .
Linkage Map Construction Is an Iterative Process
Linkage analysis is rarely completed in a single run. It is usually an iterative process involving several rounds of filtering, grouping, ordering, error checking, and map comparison.
Different parameter settings may produce slightly different maps. For example, changing the LOD threshold, recombination threshold, or grouping method may alter the number of linkage groups, marker order, or total map length. Because of this, linkage mapping often involves a trial-and-error process.
A useful practical criterion is to compare alternative map versions and select the most biologically plausible map with the smallest reasonable genetic distance. A very inflated map often indicates genotyping errors, incorrect marker order, or problematic markers. However, the shortest map should not be accepted blindly; it must also be consistent with the expected chromosome number, marker distribution, and biological information available for the species.
Comparing the Genetic Map with the Physical Genome Assembly
The final linkage map should be compared with the physical genome assembly whenever a reference genome is available. This comparison helps validate marker order, detect possible inconsistencies, and identify regions where the genetic and physical maps disagree.
For a broader explanation of how physical maps, genome assemblies, and marker positions are connected, see our article on physical maps vs genetic maps .
However, disagreement between the genetic map and the physical map does not always mean that the genetic map is wrong. Genome assemblies may contain errors, misplaced scaffolds, or incorrectly oriented regions. In addition, the parental lines used to build the mapping population may carry structural variations, such as inversions, translocations, or large insertions and deletions, compared with the reference genome.
For this reason, the genetic map can sometimes provide more reliable information about the inheritance pattern in the specific mapping population than the reference genome alone. The best interpretation comes from combining both sources of information: the physical position of markers in the genome and the recombination-based order observed in the population.
Where Genetic Maps Are Useful
Genetic maps are powerful tools in plant breeding and genetics. They provide the foundation for many downstream analyses and practical breeding applications.
QTL Mapping
A genetic map is required for QTL mapping in biparental populations. The map allows researchers to scan the genome and identify regions associated with traits such as yield, disease resistance, flowering time, fruit quality, stress tolerance, or adaptation.
Marker-Assisted Selection
Once QTLs or major genes are detected, linked markers can be used to support marker-assisted selection. A reliable genetic map helps breeders identify markers that are close enough to the target locus to be useful in selection.
Gene Discovery and Candidate Gene Identification
Genetic maps can be combined with physical genome information to narrow down candidate genomic regions. This is especially useful when breeders or researchers want to identify candidate genes underlying important agronomic traits.
Genome Assembly Validation
High-quality genetic maps can also help validate genome assemblies. When the marker order in the genetic map disagrees with the physical genome assembly, this may reveal assembly problems or structural differences between the reference genome and the parental lines.
Key Takeaway
Linkage analysis is much more than a computational step before QTL mapping. It is a process that requires careful data curation, biological interpretation, and repeated validation. Missing data, segregation distortion, genotyping errors, and wrong allele phasing can all affect the final map.
A good genetic map is not simply the one produced automatically by the software. It is the map that combines clean marker data, stable linkage groups, reliable marker order, reasonable genetic distances, and consistency with the biological context of the population.
Frequently Asked Questions
What is linkage analysis?
Linkage analysis is a statistical method used to identify markers that are inherited together and to construct genetic maps based on recombination frequencies.
Why are genetic maps important?
Genetic maps provide the framework for QTL mapping, marker-assisted selection, gene discovery, and genome assembly validation.
Why do genotyping errors affect linkage maps?
Genotyping errors are often interpreted as recombination events, which artificially increase map distances and can lead to incorrect marker ordering.
What software can be used for linkage analysis?
Common tools include QTL IciMapping, JoinMap, OneMap, ASMap, and R/qtl.
How many linkage groups should a genetic map contain?
Ideally, the number of linkage groups should correspond to the haploid chromosome number of the species being studied.
Explore Our Upcoming QTL Mapping Workshop
At AgroSynapsis, we offer practical training workshops and coaching in molecular breeding, QTL mapping, genetic map construction, and marker-assisted selection.
Our upcoming workshop on QTL mapping includes the construction of genetic maps using QTL IciMapping. No coding experience is required. The workshop is designed for breeders, researchers, students, and seed-company professionals who want to understand how to move from molecular marker data to genetic maps and QTL discovery.
👉 If you would like to be informed about our upcoming workshops and receive early access and discounts, fill out our short training interest form here:
Explore our upcoming QTL Mapping Workshop
Selected Publications and Resources
- Hackett CA, Broadfoot LB. Effects of genotyping errors, missing values and segregation distortion in molecular marker data on the construction of linkage maps. Heredity. 2003.
- Cartwright DA, Troggio M, Velasco R, Gutin A. Genetic mapping in the presence of genotyping errors. Genetics. 2007.
- Meng L, Li H, Zhang L, Wang J. QTL IciMapping: Integrated software for genetic linkage map construction and quantitative trait locus mapping in biparental populations. The Crop Journal. 2015.
- Margarido GRA, Souza AP, Garcia AAF. OneMap: software for genetic mapping in outcrossing species. Hereditas. 2007.

