Synthetic Reads Generator (SyRGen)

Open access edition 31_2: 18th April 2025, refreshed Oct 2025 release notes

  • Where?
    • A web-based version of this application is currently available at GRCh38 Reads Generator
    • just click the link; no user name or password needed:
  • How? Check the Help page for advice on browser settings and application features.
  • What? SyRGen creates reads with known variants that you can put through an informatics pipeline to ensure the variants are being correctly identified.
  • Why? We need standards and this system creates test data that may anticipate data recoverable from rare tissue types.
  • When? – refresh this page in case of maintenance updates
  • Tell us what you think: Email syrgenreads@gmail.com

Contents of the data in this edition:

Genomic, mRNA and cDNA sequence data are available for the following reference genes, amongst others:

AK2 – Mutations in AK2 are know to cause reticular dysgenesis, a severe neonatal condition. Six different Haplotype Variants are presented here including SNV, a splice-junction variant; known and imagined variants are included.

NB: The MANE_Select definition for AK2 is absent from EnsEMBL’s GRCh37 feature-data, but it has been edited into the GRCh37 data for SyRGen. To emphasise this, the hyperlink text to Ensembl for this Haplotype includes “GRCh38” to differentiate it from links to GRCh37 for the other transcripts ie:

Table of AK2 variants included in SyRGen:

Haplotype Variant Source NameHGVS expression (CDS-sequence definition)Alternative definitions or comment
AK2 EB0001NM_001625.4(AK2):
c.453del(p.Tyr152fs)
implemented as NM_001625.4(AK2):
c.452_453CC>C(p.Tyr152fs)
Single-base deletion clinvar:18254 dbSNP:rs1553151177
AK2 EB0002NM_001625.4(AK2):
c.336_338del(p.Asp113del)
Three-base deletion clinvar:840748
AK2 EB0003NM_001625.4(AK2):
c.331-1G>A
Splicing-site deletion clinvar:18253 dbSNP:rs1192619329
AK2 EB0004NM_001625.4(AK2):
c.698_699del(p.Lys233fs)
Deprecated clinvar:1034623
AK2 EB0005NM_001625.4(AK2):
c.350_402del(p.Lys117Thrfs*32)
Theoretical; 53-base-long deletion
AK2 EB0006NM_001625.4(AK2):
c.406_425+3dup dup=ATCCGAAGAATCACAGGAAGGTA
Theoretical; 23-base duplication crossing a splice-boundary.

ATM – A long gene with multiple alternate mRNA transcripts and a few example variants.

BRCA1 – A demonstration, in the GRCh38 set, that the Sequence Reads Simulator accurately excludes variant features within introns when selecting an mRNA Template, compared to a genomic Template. The output data files for BRCA1 are listed in detail, as examples, with some annotation.

CIITA – A gene associated with bare lymphocyte syndrome (BLS). Seven Haplotype variants are listed; known and imagined variants are included.

Table of CIITA variants included in SyRGen:

Haplotype Variant Source NameHGVS expression (CDS-sequence definition)Alternative definitions or comment
CIITA EB0101NM_000246.4(CIITA):
c.36C>A (p.Tyr12Ter)
SNV clinvar:1076860 dbSNP:rs367628451
CIITA EB0102NM_000246.4(CIITA):
c.359-2A>G
clinvar:1068098
CIITA EB0103NM_000246.4(CIITA):
c.2890_2969+1del81;NP_000237.2: p.(Leu964Profs*6)
HGMDPro:P_000237.2 dbSNP:rs1555507411
CIITA EB0104NM_000246.4:
c.3229_3233+7delATGGAGTGAGTG
HGMDPro:unknown
CIITA EB0105NM_000246.4(CIITA)
:c.1820_1848dup dup=ACAGCCACAGCCCTACTTTGTGCCGGGCA
Theoretical duplication. 1820_1848 defines the sub-sequence to duplicate, but it needs two of these in a delins, so another way to define this in “Create a Haplotype Variant” is to use a del Ins: c.1819C>CACAGCCACAGCCCTACTTTGTGCCGGGCA NB: the CIGARs are different
CIITA EB0106NM_000246.4(CIITA):
c.975_976insCTTTTGGAATA
Theoretical insert between these two positions.
In “Create a Haplotype Variant”, use a delins. C.975_976AGACTTTTGGAATAG
CIITA EB0107NM_000246.4(CIITA):
c.975_976insCTTTTGGAATA
As EB0106; but EB0107 uses an insert definition that is not currently definable in “Create a Haplotype Variant”

EGFR – In the GRCh38 set: includes exon-19 deletions not detected in available tests, as identified in Molecular characteristics and clinical outcomes of EGFR exon 19 indel subtypes to EGFR TKIs in NSCLC patients” by Su et al Oncotargetv.8(67); 2017 Dec 19

When curated by Replicon Genetics, some of these types were not defined in public domain databases. The help section on EGFR gives more information

KRAS – In the GRCh38 set: includes a demonstration of the KRAS G12C variant in KRAS_som1. This is dbSNP:rs121913530 where a C->A on + strand is G->T on the reverse, coding, strand. Using HGVS nomenclature (KRAS):c.34G>T (p.Gly12Cys)

A clustered set of variants is shown in KRAS_hap2, for comparison with the KRAS_minus equivalent

KRAS_minus – In the GRCh38 set: a demonstration of the same data as KRAS, but on the reverse (-) coding strand. KRAS_hap2 shows the subtle difference between the CIGARs created when reading on the opposite strand, and that the application successfully calculates the same global position for the same variant on the opposite strand.

NCF1 – NCF1 variants are involved in autoimmune disease. Five Haplotype variants are listed; known and imagined variants are included.

Table of NCF1 variants included in SyRGen:

Haplotype Variant Source NameHGVS expression (CDS-sequence definition)Alternative definitions or comment
NCF1 EB0201NM_000265.6(NCF1):
c.502del(p.Glu168fs)
G is deleted dbSNP:rs1563003964 HGMD-PUBLIC:CD931024
NCF1 EB0202NM_000265.6(NCF1):
c.73_74GT[1] (p.Tyr26fs) seems to be a synonym for c.73_74delGT The [1] seems to mean “one GT” where variant is GTGT > GT
GT deletion Clinvar:2249 dbSNP:rs4029402
NCF1 EB0203NM_000265.6(NCF1):
c.574G>A(p.Gly192Ser)
Clinvar:2255 dbSNP:rs119103273 HGMD-PUBLIC:CS014967 HGMD-PUBLIC:CM104214
NCF1 EB0204NM_000265.6(NCF1):
c.331_339delTGTCCCCAC
Theoretical deletion
NCF1 EB0205NM_000265.6(NCF1):
c.765_800+2del38
Theoretical long deletion into intron

With thanks to Dr Eleanor Baker at North West Genomic Laboratory Hub for defining the AK2, CIITA and NCF1 variants.

© Copyright Replicon Genetics & Cary O’Donnell 2021-2025 & available under the terms of the GNU Affero General Public License version 3 (AGPL-3.0 license)