Introduction to HLA
Human leukocyte antigen (HLA) is a 3.6 Mb segment on the short arm of chromosome 6 that contains over 200 genes. It is also known as the major histocompatibility complex (MHC), and it is the most polymorphic region in the human genome, involving diverse immune reactions. HLA is divided into HLA-class I (corresponding to MHC class I) and HLA-class II (corresponding to MHC class II) according to the different types of T cells they interact with. In earlier studies, HLA genes are involved in recognizing foreign antigens and ensuring a match for transplanted tissue. Variants in HLA genes are now thought to increase susceptibility to a variety of immune-mediated diseases, such as rheumatoid arthritis and diabetes mellitus type 1.
Rapid advances in sequencing technology have changed the way of querying the role of HLAs in clinical and basic studies. For instance, high-throughput sequencing, which dramatically increases the throughput with high precision, and long-read sequencing technologies with no fragments occurring that generate full-length sequences. Currently, HLA profiling is mainly performed by short-read sequencing strategies.
Short-Read Sequencing Platforms
Short-read sequencing read length refers to a few hundred bases, including Ion Torrent and Illumina. The Illumina platform, the most commonly used one, allows for paired-end sequencing, improving the ability to identify structural rearrangements. NGS-based HLA-targeted methods (e.g., PCR-based target amplification, hybridization capture technique), whole exome sequencing (WES) and whole genome sequencing (WGS) are currently available to help analyze the complete nucleotide sequences of HLA regions.
Integrating suitable bioinformatics pipelines and NGS-based approaches is important for the accurate genotyping of HLA and mapping complete HLA regions. The appropriate bioinformatics approach depends on the sample preparation method and the type of sequencing platform. HLA genotyping bioinformatics solutions are designed to provide reliable, accurate and reproducible results to infer the precise variation in HLAs and to understand the relationship between HLA polymorphisms and the etiopathogenesis of diseases.
Long-Read Sequencing Platforms
One source of incorrect or ambiguous HLA typing results is the difficulty of correctly inferring phase relationships between variants along with two HLA alleles. Long read sequencing technology offers a promising solution to this problem, with long read sequencers generating reads exceeding 10,000 bases, allowing for covering the complete HLA regions. However, both platforms, PacBio SMRT and Nanopore Platforms are still at a disadvantage due to high error rate (10-14% per read). For the characterization of novel alleles containing long introns, a combination of short-read and long-read sequencing may have the potential to provide maximum resolution and accuracy with the appropriate algorithm.
The Bioinformatics Tools of HLA Genotyping
Achieving high-resolution HLA typing results from data flood requires multiple bioinformatics systems and analysis programs. The high degree of HLA polymorphism, the extreme similarity among alleles, and the lack of complete references, are three main challenges for HLA typing from short-read data. To deal with it, several bioinformatic tools have been created for fast and accurate large-scale HLA genotyping, such as HISAT-genotype, xHLA, HLA-HD, and HLAscan.