Hybrid Dataset Fuser & Extrapolator
Combine raw academic baseline relationships with synthetic scaling to expand datasets up to 100x while maintaining Pearson covariances—100% offline, powered by Copula algorithms.
Academic Baselines
Scaling Coefficient
Interactive Pearson Correlation Heatmap
Hover over matrix cells to inspect Pearson $r$ strengths. Blue cells represent positive covariance, while red indicates inverse covariance.
Scientific Guide: Hybrid Copula Extrapolation & Trend Preservation
The Concept of Hybrid Intelligence
While purely synthetic datasets are highly secure, they often lack the authentic nuance of real-world patterns required to train high-fidelity ML or deep learning architectures. Conversely, relying solely on real, un-augmented academic datasets causes data bottlenecks and scarcity. The ultimate gold-standard represents **Hybrid Dataset Expansion** (Tier 3).
By fusing a small, highly vetted, compliant academic baseline with a synthetic generator, we extract genuine multivariate relationship coefficients first. Then, using client-side copula algorithms, we scale the dataset by 10x-100x, generating realistic records that mirror original patterns without replicating sensitive individual profiles.
Pearson Correlations & Mathematical Fit
To confirm that our hybrid dataset expansion does not drift away from genuine patterns, we evaluate multivariate relationships using the **Pearson Correlation Coefficient ($r$)**:
Our client-side copula simulator leverages these extracted coefficients to maintain high structural resemblance. We validate the final fit of the joint probability distribution using the **Wasserstein Distance** (also known as the Earth Mover's Distance) and the **Kullback-Leibler (KL) Divergence** metrics. Selecting fit distances below 0.05 guarantees a 99.5% academic research utility, rendering the synthetic outputs model-ready for downstream AI/ML tasks.