Flamingo Matrix

flamingo matrix

We present flamingo matrix, an optimal high-resolution 3D genomic structure reconstruction algorithm that incorporates low-rank matrix completion (LRMC) to address the challenging task of predicting long-range interactions from Hi-C data. The model is scalable in that it does not require tuning of any free parameters and achieves fast convergence on both simulated and experimental Hi-C data, while displaying robustness across different chromosomes and cell types. Moreover, the reconstructed structures show clear loop structures for TADs and predict short inter-TAD distances, which are consistent with the observed Capture-C, ChIA-PET, and SPRITE experiments.

The model is computationally efficient because it uses a recursive procedure with sparse weighting, which requires only O(N 2 log N) operations on the large DNA fragments from which it constructs the distance matrix. Furthermore, a hierarchical strategy is employed to partition each chromosome into an inter-domain hierarchy consisting of 1 Mb domain-level fragments and an intra-domain hierarchy of 5 kb DNA fragments. FLAMINGO applies the same low-rank matrix completion algorithm to both hierarchies and computes the resulting distance matrix, which can be used directly for structural reconstructions at chromosome-wide resolution (Methods).

To assess the accuracy of our method, we conducted a comprehensive comparative study with state-of-the-art algorithms such as GEM-FISH, ShRec3D, Hierarchical3DGenome, RPR, and SuperRec, using the same benchmark data set. We found that FLAMINGO is the most accurate approach for constructing both intra-domain structures at 5 kb resolution, and whole chromosome structures at 10 kb resolution. This superior performance is not dependent on the choice of the target DNA fragment size or domain partitioning, nor is it influenced by the chosen down-sampling rate (Supplementary Fig. 1).

Moreover, FLAMINGO is able to interpret distal eQTLs by placing them into spatial proximity with their target chromatin regions, which are typically not captured by other methods (Supplementary Fig. 4). For example, the reconstructed structure of the interaction between the SNP rs77725975 and the FERMT3 promoter in GM12878 correctly assigns a short distance between the two anchors, while the Hi-C contact map fails to provide a strong signal of this interaction.

We also demonstrate that FLAMINGO can effectively predict 3-way interactions involving distal anchors with very long genomic distances, as shown by the example of a fertility-associated QTL in GM12878 (Fig. 4g). This interaction is predicted to be a loop structure by FLAMINGO and is located in close proximity with the target H3K4me1 peak, whereas the distance matrix based on the Hi-C contact map shows no such signal of long-range interaction (p-value 2.0 x 10-5, one-sided Wilcoxon test). Thus, our results suggest that a key advantage of FLAMINGO is its ability to reliably reconstruct structural information that can be used to understand distal eQTLs. We further developed an integrative variant of FLAMINGO, iFLAMINGO, to incorporate chromatin interactions from cell-type-specific DNase-seq datasets, and this additional input improves the performance of FLAMINGO for both whole chromosome and intra-domain structures at 5 kb-resolution (Supplementary Fig. 3). The improved performance of iFLAMINGO is comparable to the performance of FLAMINGO when compared with other methods for both whole chromosome and intra-domain reconstructions at 5 kb-resolution.