AbstractThe piggyBac DNA transposon is used widely in genome engineering applications. Unlike other transposons, its excision site can be precisely repaired without leaving footprints and it integrates specifically at TTAA tetranucleotides. We present cryo-EM structures of piggyBac transpososomes: a synaptic complex with hairpin DNA intermediates and a strand transfer complex capturing the integration step. The results show that the excised TTAA hairpin intermediate and the TTAA target adopt essentially identical conformations, providing a mechanistic link connecting the two unique properties of piggyBac. The transposase forms an asymmetric dimer in which the two central domains synapse the ends while two C-terminal domains form a separate dimer that contacts only one transposon end. In the strand transfer structure, target DNA is severely bent and the TTAA target is unpaired. In-cell data suggest that asymmetry promotes synaptic complex formation, and modifying ends with additional transposase binding sites stimulates activity.
IntroductionTransposons are mobile genetic elements that can move from one position to another in the genome or between host genome and foreign DNA1. They make up a large percentage of most eukaryotic genomes1,2,3, where they have played important roles in genome evolution and the establishment of novel cellular functions and pathways4. Although their movement has been severely constrained in humans, largely through inactivation, transposition has been linked to the development of specific diseases5,6.DNA transposons, most notably Sleeping Beauty (SB) and piggyBac (pB), have been intensively exploited as tools for genome engineering and therapeutic applications7,8,9,10. Among eukaryotic DNA transposons, only pB is known to specifically integrate at TTAA sites and to exhibit the property of seamless excision whereby the genomic gap produced when the transposon is excised can be repaired precisely, without need for any DNA synthesis (Fig. 1a)11,12. pB has proved to be extremely versatile and the lack of a DNA footprint left behind after its transposition is a unique and valuable property13,14,15,16. It is used in non-viral vectors for transgenesis17,18, gene therapy7,19, insertional mutagenesis20, and genetic screens21,22,23. It has also found application in novel therapeutic strategies including CAR T-cell engineering24,25,26, CRISPR/Cas-mediated gene therapy27,28,29, and human induced pluripotent stem cells (iPSC) engineering30,31,32. Useful variants have been developed through random mutation including a hyperactive pB transposase called hyPBase33 as well as one that can excise pB but cannot integrate it34.Fig. 1: Overview of piggyBac transposition.a Mechanism of piggyBac (pB) transposition. Hydrolysis by the piggyBac transposase (PB) liberates the 3′-OH on the DNA strand that will be integrated. This is followed by transesterification in which this 3′-OH attacks flanking DNA four nt from the transposon end forming a DNA hairpin. For subsequent integration, PB opens the hairpin at each end in another hydrolytic step, leaving a four nt TTAA overhang; this is followed by a second transesterification step that joins each end to target DNA. At the empty donor site (inset), complementary DNA strands allow for seamless repair. LE left end, RE right end, TIR terminal inverted repeat, TSD target site duplication. b Schematic of pB transposon flanked by TTAA, and sequence and organization of the LE and RE TIRs. NTS non-transferred strand, TS transferred strand. c Domain organization of PB. The catalytic domain contains the conserved DDD motif (in red). NTD N-terminal domain. Dimerization and DNA-binding domain (DDBD). CRD C-terminal cysteine-rich domain. Gray indicates disordered regions in the structure. d Overall structures of the synaptic hairpin DNA (SNHP) complex and strand transfer complex (STC).Although pB (Fig. 1b) was isolated from the cabbage looper moth Trichoplusia ni over 30 years ago11, no structural information is available that would explain its excision and targeting properties or provide a rational basis to understand how mutations affect its activities. pB contains a single open-reading frame (ORF) encoding its 594 amino-acid transposase (PB) (Fig. 1c) flanked by terminal inverted repeats (TIRs) (Fig. 1b). The pB TIRs are asymmetric; sufficient for certain in vitro and in vivo activities are the 35 bp left end TIR (LE TIR or LE35) and 63 bp right end TIR (RE TIR or RE63)35. Although LE35 and RE63 have short sequences and repeats in common (shown in blue, green, and purple in Fig. 1b), the relative organization of these differs and RE63 has a 17 bp insertion and a sequence duplication between segments relative to LE35 (Fig. 1b, in white and green, respectively), a puzzling arrangement.PB is a cut-and-paste DNA transposase, proposed to be a member of the RNaseH-like or retroviral integrase superfamily36. PB-catalyzed transposition proceeds through a series of hydrolysis and transesterification steps35 that generate an excised intermediate in which DNA hairpins protect the transposon ends (Fig. 1a). The reaction sequence differs from those of other eukaryotic DNA transposases, such as members of the Tc1/mariner and hAT superfamilies, or the Transib precursor to RAG1 of the V(D)J recombination system37,38,39,40, but is identical to the prokaryotic IS4 family of insertion sequences and transposons such as Tn541. Despite this shared reaction pathway, however, only pB precisely targets TTAA sequences36.Motivated by pB’s unique properties and its importance in established and emerging applications, we have determined two ab initio single-particle cryo-electron microscopy (cryo-EM) structures of PB in complex with DNA: a synaptic complex that contains the hairpinned DNA intermediates (synaptic hairpin complex (SNHP)) and a strand transfer complex (STC) in which two TIRs have been integrated into target DNA. The structures explain the basis of seamless excision and targeting, and together with biochemical and in-cell transposition data, suggest a transposition model that requires asymmetric TIRs in order to form the active synaptic complex in vivo.ResultsMammalian-expressed PB is active in vitroPB expressed and purified from mammalian cells forms dimers as judged by SEC-MALS (Supplementary Fig. 1a)42. In vitro, as previously demonstrated for PB expressed in E. coli35, it catalyzed hairpin formation and resolution with oligonucleotide TIR substrates (Supplementary Fig. 1b). It also catalyzed double-ended integration of pre-cleaved transposon ends into a supercoiled (SC) pUC19 target plasmid (Supplementary Fig. 1c). For example, incubation with a pre-cleaved 35 bp LE TIR oligonucleotide (LE35) and pUC19 generated products consistent with single-end (SE) and double-end (DE) integration (Supplementary Fig. 1c). Under assay conditions with a protein:TIR ratio of 1:2, plasmid integration was less efficient when a pre-cleaved 63 bp RE TIR (RE63) was used, or when LE35 and RE63 were mixed at equal molar ratio (Supplementary Fig. 1c).Present in both LE35 and RE63 is a palindromic-like 19-bp internal repeat (the purple box sequence in Fig. 1b) that is the binding site for the C-terminal cysteine-rich domain (CRD; residues 553–594) of PB43. The CRD is required for PB activity and has been proposed to be the driver of TIR binding. When we probed the effect of sequentially truncating RE63 to remove the CRD-binding site (Supplementary Fig. 1d), we were surprised to observe increasing stimulation of integration into either linearized or SC pUC19 as the RE was shortened to RE24. Further deletion to RE16 largely abolished integration.To confirm that mammalian-expressed PB integrated with its expected target site specificity, we generated a linear mini-transposon in which LE35 and RE63 TIRs flanked a Kan resistance gene, and used this as a substrate for in vitro integration into SC pUC19. The reaction products were purified and transformed into E. coli, allowing us to select for Amp+Kan+ colonies corresponding to integrated mini-transposons. Sequencing confirmed that only TTAA tetranucleotides had been targeted: from 10 sequenced colonies, we detected four mini-transposon integration events corresponding to insertion at TTAA sequences at bp 635–638, bp 1568–1571, bp 1582–1585, and bp 2646–2649 of pUC19.Cryo-EM structure determinationFor structural studies, we investigated the biophysical properties of a variety of PB/TIR complexes. In most cases, analytical size-exclusion chromatography (SEC) revealed multidisperse elution profiles with overlapping peaks. However, when PB was complexed with LE35 hairpin intermediates (Supplementary Fig. 2a) in the presence of Ca2+ (which does not catalyze hairpin opening, Supplementary Fig. 1b), it was possible to isolate a monodisperse population containing SNHP (Supplementary Fig. 2b). We also assembled a STC that contained PB, Ca2+, and 37 bp LE TIRs covalently joined to target DNA (Supplementary Fig. 3). As DNase I footprinting data suggested that PB interacts with 4–5 bp of flanking DNA beyond its TTAA target43, target DNA included 11 bp on either side of the central TTAA (Supplementary Fig. 3a). To avoid base-pairing during substrate annealing between the target sequence and the four nt TTAA overhang derived from hairpin opening (Fig. 1a), we substituted the overhang with the mutated sequence CCGG (Supplementary Fig. 3a). It has been shown that the four nt overhang is not required for TTAA target site selection in vitro35 nor do mutated flanking sequences affect specific integration into TTAA sites in vivo44. Our attempts to prepare stable monodisperse complexes containing one LE TIR and one RE TIR have not yet been successful.We collected single-particle cryo-EM data on both complexes. For each, particle projections showed a wide distribution of different orientations in the micrographs and a number of distinct two-dimensional (2D) classes (Supplementary Fig. 4a, b). For the SNHP complex, we applied a mask around the CRD for masked three-dimensional (3D) classification which improved the resolution of the reconstructed density map to 3.66 Å (Supplementary Fig. 4a, c). For the STC, we used two rounds of 3D classification and one round of masked 3D classification, resulting in a 3.47 Å density map (Supplementary Fig. 4b, e) with better map quality when compared with the SNHP complex, most likely due to the additional stabilizing effect of the target DNA. While parts of the maps were clearly two-fold symmetric, the maps as a whole had no rotational symmetry (Supplementary Fig. 4a, b). Therefore, all processing was done without applying symmetry.An NMR structure has been determined for the CRD (PDB 5LME)43, but as there was no known homologous structure available for the rest of PB, ab initio atomic models were built into Phenix-sharpened45 potential density maps. For the SNHP complex, we used Phenix’s map to model module46 and fragments were connected manually. The NMR model of the CRD dimer was consistent with the potential density in the region and was used as a starting model. We used density-guided rebuilding tools of Rosetta47 to complete and verify the trace and register of the model that was subsequently refined using Rosetta and Phenix. The STC model was built based on the SNHP model. Representative regions of the potential density maps and model fits are shown in Supplementary Figs. 5 and 6. The final models contain residues 117–594, consistent with the prediction by the PONDR server48 that the N-terminal 110 residues are largely intrinsically disordered.Overall structures of PB transposase complexesIn both the SNHP and STC complexes, an asymmetric PB dimer synapses two approximately parallel TIRs (Fig. 1d). As predicted35, the catalytic domain (residues 263–457, separated by an insertion domain from residues 372 to 433) possesses the fold of the RNaseH-like superfamily of transposases. After the fifth β-strand of the catalytic domain, a three β-stranded insertion domain interrupts the RNaseH-like fold (Fig. 2a and Supplementary Fig. 7), the same topological location as in all other DDE/D transposases featuring insertion domains49. The convergence of residues 117–263 and 457–535 forms a structurally unique50 all-α-helical domain (denoted Dimerization and DNA-binding domain, DDBD) knitting the protein together and which interacts with TIR bp 7–16 (Fig. 2b). The C-terminal end of the DDBD is connected to the CRD through an extended linker that exhibits weak density.Fig. 2: Comparison of PB and Tn5 and details of PB TIR recognition.a Comparison of domain organization in PB SNHP and Tn5 transpososomes. RNaseH-like catalytic domains are in green, with active site residues highlighted in red. Insertion domains are colored in violet. The insertion domain of PB contains fewer β-strands than that of Tn5. b The DDBD interacts with LE TIR in trans. The α-helices comprising DDBD widen the major groove of the repeat shown in green. The red sphere is a Ca2+ ion in the DDBD/DNA interface. Other parts are colored as in Fig. 1d. c Comparison of TIR binding by PB (TIR DNAs are parallel; shown for the SNHP complex) and Tn5 transposase (anti-parallel TIRs; PDB 1MUS). Both transposases demonstrate cis and trans protein/DNA interactions. pB DNA colors are as in Fig. 1d. Tn5 TIR DNA is brown. Individual monomers are shown in yellow and pink. Active site locations are indicated (red).In both complexes, the DDBD, catalytic and insertion domains collaborate to synapse two LE TIRs and direct the scissile phosphates to the active sites comprised of D268, D346, and D447 where a Ca2+ ion is bound by the carboxylates of D268 and D346 (Supplementary Fig. 6a, b). A bridging oxygen from the scissile phosphate also forms part of the Ca2+ ligand sphere, suggesting that PB follows the paradigm of the two metal-ion-dependent chemical mechanism51,52.In the SNHP complex, the horseshoe-shaped dimer formed by the DDBD, catalytic and insertion domains creates a wide channel whose sides are lined with β-hairpin loops largely contributed by the insertion domain (Fig. 1d). In the STC, this channel is filled with the target DNA (Fig. 1d). The SNHP and STC complexes are very similar and can be superimposed at 1.22 Å rmsd over 933 α-carbon positions. Upon target DNA binding, the channel formed by the catalytic and insertion domains narrows by about 3 Å as the two halves of the transpososome close down upon it, although we cannot rule out the possibility that this observation was influenced by the selection of the 3D classes.CRD binding generates asymmetry and bends one TIRThe DDBDs, catalytic and insertion domains, and the first 16 bp of the LE TIRs bound by them obey two-fold symmetry (Fig. 1d and Supplementary Fig. 4) (extending also to the target DNA in the STC), yet overall the complexes are markedly asymmetric as the two CRDs form a dimer (Fig. 3) that binds bp 17–33 of only one of the LE TIRs (corresponding to the 19-bp palindromic internal repeat, in purple in Fig. 1b). There is no detectable density around the equivalent palindromic internal repeat sequence of the other LE TIR and, despite the weaker DNA density (presumably due to the lack of the stabilizing effect of bound CRDs), it is clear that it is not bent (Supplementary Fig. 4). The CRD monomers refined against the cryo-EM maps are consistent with the cross-brace Zn finger structure determined by NMR43, although not identical due to side chain rearrangements at the hydrophobic CRD dimer interface (Fig. 3c).Fig. 3: C-terminal CRD binding generates asymmetry.a Cysteine-rich domain (CRD) dimer binds the palindromic 19-bp internal repeat of only one LE TIR, resulting in a ~40° kink of the DNA. DNA is colored as in Fig. 1d. Individual monomers are shown in yellow and pink. Bound Zn2+ ions are rendered as spheres. b Close-up view of the CRD dimer–DNA interaction. Top, sequence and numbering of hairpin DNA in the SNHP complex. The palindrome within bp 17–33 (purple) is underlined, and the two-fold symmetry axis is indicated. Bottom, orthogonal views of the CRD dimer. DNA binding by the CRDs narrows the minor groove centered at ~bp 25. NTS non-transferred strand, TS transferred strand. c Hydrophobic residues at the CRD dimer interface.CRD binding causes a ~40° bend in the TIR due to the insertion of the CRDs into two adjacent widened major grooves, accompanied by a significant narrowing of the intervening minor groove centered around bp 25 (Fig. 3a, b). The purple box of LE TIR DNA contains a canonical A-tract (bp 22-25) that appears to display propeller twist and has a narrowed minor groove, features previously noted for A-tracts53. The CRD dimer is approximately two-fold symmetric with a rotation axis perpendicular to the DNA and centered at the minor groove in the middle of the CRD-binding site (Fig. 3b). At the level of the CRD monomer, the observed mode of DNA binding is broadly consistent with the model derived from NMR including the identification of Y558, R567, and K569 as crucial residues for specific recognition43, but the NMR model did not predict the severe DNA bend induced by CRD dimerization.TIR recognition close to the transposon endAlthough the DDBDs interact with both TIRs, most of the protein/DNA interactions are in trans, such that the interactions of one monomer with a TIR direct its end into the active site of the other monomer (Fig. 2b). Such trans arrangements are common in transpososome assemblies40,54,55,56. The parallel TIR configuration is in sharp contrast with the arrangement within the Tn5 transpososome54 (Fig. 2c) that catalyzes the same reaction sequence. Remarkably, the two transpososomes are organized in completely different ways, reflecting two very different modes of transposase dimerization, with the Tn5 TIRs almost antiparallel and essentially straight (Fig. 2c). However, the two transposases do share the feature of an entirely β-stranded insertion domain (Supplementary Fig. 7). While the DD(E/D) catalytic and parts of the insertion domains of PB and the Tn5 transposase can be aligned (1.99 Å rmsd over 182 α-carbons), the sequence identity in this region is only 9%.In both PB complexes, the catalytic domain interacts with the minor groove closest to the transposon end (bp 2–6) through a rich set of interactions, including two α-helices linked by G444 and G445 (α9–α10 in Supplementary Fig. 7) that turn and follow the curve of the minor groove. The α10 helix carries D447 that together with D268 and D346 form the catalytic triad (Supplementary Fig. 6a, b). One turn away from the tip of the transposon, the DDBD binds the TIR in trans, widening the major groove (bp 9–14) through the insertion of three different parts of the protein (i.e., α4, α5, and a loop between α2 and α3) (Fig. 2b). In addition, the two adjacent, narrowed minor grooves (bp 5–8 and bp 15–18) are also recognized by elements of the DDBD (Fig. 2b). There is also a curious arrangement in which three carboxylates (D197, D215, and D218) coordinate a divalent metal ion (Ca2+) that is also coordinated by several bases in the widened major groove (Fig. 2b).Compared to the PB transposase, that of Tn5 has additional β-strands, and β7 and β8 of Tn5 form a β-hairpin that lies deep in the major groove just adjacent to the DNA hairpin (Fig. 2a and Supplementary Fig. 7). For Tn5, mutations in this β-hairpin affect all steps of transposition57. The same role in PB is served by an omega loop between the first and the second β-strands of the catalytic domain that interacts with the hairpin and hairpin-proximal major groove through R275, Y283, and K290 in trans (Fig. 4b).Fig. 4: SNHP and STC of PB reveal the mechanistic link between seamless excision and TTAA targeting.a Numbering system used for SNHP and STC DNA substrates. b, c Detailed protein–DNA interactions of TTAA tetranucleotides in the SNHP and STC. The interactions involve the omega loop (brown), the catalytic domain (green), and the insertion domain (purple). In STC, A3T (equivalent to the flipped base A-1 in SNHP) is hydrogen bonded to T0T, shown as a dashed line. For clarity, the NTS is not shown. Protein domains and DNA are colored as in Fig. 1d. d Superposition of the TTAA tetranucleotides in SNHP and STC. Except A-1 in SNHP and A3T in STC, the other bases have similar conformations in both complexes. In the SNHP, A-1 is flipped out whereas, in the STC, A3T is hydrogen-bonded to T0T, represented as a black dashed line. e, f Comparison of target DNA integration of PB and PFV. PB binds and integrates into target DNA through the minor groove. Top, red triangles indicate staggered sites for integration (labeled as nucleotide T for PB and C for PFV) and the relative positions of the catalytic domains. The side view cartoons show how a PB dimer approaches target DNA towards the minor groove while PFV integrates through the major groove. Bottom, structure of the catalytic domains (green) interacting with target DNA. Due to severe distortion of the DNA backbone, the PB target site bases are unpaired, with two bases flipped out. Unlike in PB STC, the 4-bp PFV target site remains base-paired. The omega loops in PFV has no interactions with the target DNA. PFV prototype foamy virus (PDB ID: 4E7L).SNHP complex and DNA hairpin recognitionIn the SNHP complex, all the nts of the transposon end are base-paired whereas none of the four nts of the hairpin loop (numbered as A-1, A-2, T-3, T-4), derived from the flanking TSD DNA, are base-paired (Supplementary Fig. 5c). Multiple elements contribute to the stabilization of the hairpin. Backbone phosphates of the hairpin loop interact with Y439 from the catalytic domain in cis, and Y283 from the omega loop in trans (Figs. 4b and 5). There is also a base-specific interaction between N440 and A-2 in cis (Figs. 4b and 5). In addition, the methyl group of T-3 is in a hydrophobic pocket formed by V414/L416/Y439 of the insertion domain in cis.Fig. 5: Schematic diagram of protein–DNA interactions.Numbers in pentagons represent the positions of the nucleotides. Letters in the boxes are types of nucleobases. In the LE TIR hairpin DNA, the nucleotides of the TIR portion (NTS and TS) are from 1 to 35 (5′–3′). The hairpin TTAA (orange) is numbered −4 to −1 (5′–3′). In the strand transfer complex DNA, the TIR portion is numbered the same as in the LE TIR hairpin DNA. Upon integration, the TS strand is covalently joined to the target DNA started from position 0 of the target DNA (0T); thus, the bottom strand of the target DNA (pink) is from 0T to 11T (5′–3′). The top strand of target DNA (yellow) is indicated as −8T to −1T (5′–3′). The flap donor DNA (orange, CCGG) is numbered −4 to −1 (5′–3′), consistent with the hairpin DNA scheme.Although A-1 is completely flipped out, there do not appear to be aromatic stacking interactions with PB side chains. This is in contrast to the hairpin loop of Tn5 in which the collaboration of two tryptophan residues is needed to form the tight hairpin on each transposon end, one pushing out a thymine base from the non-transferred strand (NTS) and preventing its return to the helix and another stacking against it once it is flipped out54,58,59. This difference is likely because in Tn5, the tightest possible hairpin is formed by the attack of the transferred strand (TS) terminal 3′-OH on the terminal nt of the NTS directly opposite. In pB, the TS terminal 3′-OH instead attacks flanking DNA, four nt from the transposon end of the opposite strand (Fig. 1a), and the structure suggests that this is long enough that a hairpin can readily form. The active site area of PB is more open when compared to that of Tn5, consistent with the ability to accommodate a longer hairpin loop and to allow the conformation change of the NTS that might be required to bring the scissile phosphate to the active site. The 4 nt long hairpin loop is stabilized by a set of interactions with the omega loop in trans, which are absent in Tn5.STC complex reveals target DNA recognition and integrationIn the STC, all nts of the transposon DNA are base-paired as are the target nts except for the bps that comprise the specific TTAA tetranucleotide (numbered as T0T, T1T, A2T, A3T), all four of which are unpaired (Supplementary Figs. 5d and 6c). It appears that specificity for the TTAA target sequence arises from its conformation as an ssDNA segment. The two ssDNA segments are stabilized by an elaborate network of protein/DNA and DNA/DNA interactions (Figs. 4c and 5) organized by the omega loop. Remarkably, relative to the hairpin loop seen in the SNHP complex, many of the same protein residues are involved and the conformations of T0T, T1T, and A2T are essentially superimposable on T-4, T-3, and A-2 of the hairpin DNA, respectively (Fig. 4d). The major change is with A3T (equivalent to A-1 in the hairpin loop) which is not flipped out but, rather, turned back towards the other target bases and H-bonded through its N1to N3 of T0T (Figs. 4c, d and 5) and stacked against Y283. To allow the reaction to proceed from the hairpin-bound state to one poised to capture target, it appears that hairpin resolution is followed by the movement of the resulting flap out of the active site. Then, a drastically distorted and unpaired target TTAA tetranucleotide can be bound. As the structures reveal, the key to pB transposition is that the backbones of the TTAA tetranucleotide in both the hairpin and target DNA adopt a very similar conformation (Fig. 4d). The role of the PB protein is therefore to enforce this conformation at both steps of the reaction.The density is poor for the tetranucleotide flap CCGG after the first two nucleotides, presumably due to flexibility. Minimal interactions with the flap are consistent with observations that the flap is not necessary for PB reactions in vitro35 (Supplementary Fig. 1c, d), and that the exact 4 bp sequence of flanking DNA (which becomes the flap) is not important in vivo35,44. We observe interactions between several amino acid residues and backbone phosphates for five bp beyond the target TTAA, and the side chains of R372 and R376 are inserted into the minor groove of the flanking region (A5T–G8T, Fig. 5b).The STC structure suggests that PB achieves target specificity by stabilizing the strand transfer product through the ssDNA form of TTAA. This unusual mode of transposon target selection is in line with the lack of a region of PB recognizable in the structure as a potential dsDNA TTAA target recognition domain. Although the role of the disordered N-terminal region remains to be established, the predicted isoelectric point of residues 1–117 is 4.7, making it highly unlikely that this region contributes to DNA binding.PB integration occurs symmetrically, and the configuration of the dsDNA regions of the target that flank TTAA suggests that it must be severely bent before integration (Fig. 4e). Bent target DNA is a common feature of most DNA transpososome and intasome structures40,49,56,60,61,62,63,64. Strikingly, a unique aspect of the PB STC is that integration occurs at staggered target DNA phosphates across the minor groove (Fig. 4e). In all other known transpososome structures, integration has invariably been observed to occur across the major groove, as for example in the PFV intasome that also integrates with a four bp TSD (Fig. 4f, PDB 4E7L60). In PFV, the strand transfer sites (C, Fig. 4f, top) are easily accessed by bending the target DNA to widen the major groove to fulfill the distance between active sites. In the PB STC, however, the strand transfer sites (T, Fig. 4e, top) are opposite the bound PB dimer. Simply bending the target DNA is not sufficient to allow the target scissile phosphodiester bonds to reach the active sites that are located deep within the catalytic domains. Instead, to fit the orientation of the active site DDD motif, coordinated metals, and the 3′-OH of the TS during the strand transfer reaction, TTAA tetranucleotides are melted and drastically distorted (Fig. 4e). Interestingly, the omega loop of PB interacts with target DNA, facilitating the melting of TTAA (Fig. 4e, bottom). The equivalent loops in the PFV intasome (Fig. 4f, bottom) and STC structures of other transposases and integrases have no contact with target DNA. Altogether, the unique aspects of PB target DNA recognition and integration are driven by the location of the active sites within the dimer, the interactions of the omega loops, and the relative positions of the target scissile bonds.pB transposition in cellsWe used colony count and excision assays in cultured human cells using PB and pB transposon derivatives as a proxy for in vivo transposition65. Excision analysis uses PCR to amplify re-joined transposon plasmid ends recovered from transfected cells indicating transposon excision has occurred. The colony count assay involves excision of a neomycin resistance transposon (pTpB) from a transposon plasmid followed by integration into the genomes of cells. Cells that have undergone transposition grow and form colonies in the presence of G418 which allows selection for the neomycin resistance gene, thereby providing a quantitative readout of not only excision but also subsequent integration. To correlate with our structural data, we evaluated pB transposition using shortened TIRs of LE35/RE63. Experiments in HT-1080 cells transfected with pB transposon derivatives and a helper plasmid expressing PB indicated that only pB containing asymmetrical LE35/RE63 TIRs is active for both excision and integration; LE35/LE35 is inactive for both activities whereas RE63/RE63 can excise but not integrate using native PB (Fig. 6a). Evaluation of excision and integration with hyPBase33 revealed relaxed stringency for the LE35/RE63 pair as symmetrical RE63/RE63 was capable of both excision and integration whereas LE35/LE35 was also capable of excision and integration though to a lesser degree (Fig. 6b).Fig. 6: Proposed transpososome with LE/RE TIR bound and in vivo transposition assays.a Excision (left) and colony count transposition analysis (right) in cultured HT-1080 cells using wild type PB. Excision was assayed by PCR detection of repaired excision sites. Excision and subsequent integration are assessed by colony count, which represents events in which the resistance gene (located between pB TIRs) is integrated into the chromosome by pB transposition. The resulting cells form colonies in the presence of G418, indicating the full transposition pathway from excision to integration. Excision results are representative of three independent experiments. Colony counts, n = 3 independent experiments ± SD. b Excision (left) and Excision+integration (colony count) transposition analysis (right) in cultured HT-1080 cells using hyPBase. Excision and Excision+integration (colony count) transposition analysis in cultured HT-1080 cells is as in Fig. 6a. Colony counts, n = 4 independent experiments ± SD. c Proposed model for the asymmetrical binding of LE and RE TIRs by two PB dimers. Colored boxes in TIRs are the same as in Fig. 1b. d Excision and colony count transposition analysis in cultured HT-1080 cells of LE and RE TIR variants. RE63mut has a mutated green repeat, RE68 has five additional bp inserted as indicated in red, and LEtandem is lengthened by repeating a region of the RE TIR as shown in e. Excision assays shown are representative of three independent experiments. Colony counts, n = 3 independent experiments ± SD. e Proposed model representing assembly of PB with LEtandem and RE TIRs in a transpososome. The LEtandem TIR combines sequences of the LE TIR and RE TIR. The synaptic model includes three dimers of PB. Colored boxes in TIRs are the same as in Fig. 1b. Statistical analysis statement of a, b, and d: Data were analyzed using one-way ANOVA followed by Dunnett’s multiple comparison post-test comparing each column to the LE35/RE63 without control. All error bars show the standard deviation. Statistically significant differences were considered as follows: p ≥ 0.05 (ns) and p