Distributed algorithms and statistical inference for multi-site analyses: unfolding the complexity of heterogeneity in real-world data