We present flamingo matrix, an optimal high-resolution 3D genomic structure reconstruction algorithm that incorporates low-rank matrix completion (LRMC) to address the challenging task of predicting long-range interactions from Hi-C data. The model is scalable in that it does not require tuning of any free parameters and achieves fast convergence on both simulated and experimental Hi-C data, while displaying robustness across different chromosomes and cell types. Moreover, the reconstructed structures show clear loop structures for TADs and predict short inter-TAD distances, which are consistent with the observed Capture-C, ChIA-PET, and SPRITE experiments.
The model is computationally efficient because it uses a recursive procedure with sparse weighting, which requires only O(N
To assess the accuracy of our method, we conducted a comprehensive comparative study with state-of-the-art algorithms such as GEM-FISH, ShRec3D, Hierarchical3DGenome, RPR, and SuperRec, using the same benchmark data set. We found that FLAMINGO is the most accurate approach for constructing both intra-domain structures at 5 kb resolution, and whole chromosome structures at 10 kb resolution. This superior performance is not dependent on the choice of the target DNA fragment size or domain partitioning, nor is it influenced by the chosen down-sampling rate (Supplementary Fig. 1).
Moreover, FLAMINGO is able to interpret distal eQTLs by placing them into spatial proximity with their target chromatin regions, which are typically not captured by other methods (Supplementary Fig. 4). For example, the reconstructed structure of the interaction between the SNP rs77725975 and the FERMT3 promoter in GM12878 correctly assigns a short distance between the two anchors, while the Hi-C contact map fails to provide a strong signal of this interaction.
We also demonstrate that FLAMINGO can effectively predict 3-way interactions involving distal anchors with very long genomic distances, as shown by the example of a fertility-associated QTL in GM12878 (Fig. 4g). This interaction is predicted to be a loop structure by FLAMINGO and is located in close proximity with the target H3K4me1 peak, whereas the distance matrix based on the Hi-C contact map shows no such signal of long-range interaction (p-value