Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime for large datasets. #84

Open
sdsilva10 opened this issue Sep 29, 2021 · 4 comments
Open

Runtime for large datasets. #84

sdsilva10 opened this issue Sep 29, 2021 · 4 comments

Comments

@sdsilva10
Copy link

Hi,

I am trying to generate some cohort metrics for QC steps via peddy. My sample size is about 187000. I have provided the gz zipped VCF and fam (PLINK format) file for these samples as input. On running the command for the QC plots, all sample id are listed and a terminal output "ped_check" appears. However, there is no progress beyond this stage, and the process continues to run beyond 24 hrs mark.

I have executed this run on a HPC node:
Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
RAM: 180 Gb.

Is there a limitation on the input sample size?

@brentp
Copy link
Owner

brentp commented Sep 29, 2021

187 thousand samples!?
That is too big for peddy. You might try somalier on batches of ~20 thousand at a time.

@sdsilva10
Copy link
Author

Ok, I'll give that a try. Thank you.

@sdsilva10 sdsilva10 reopened this Oct 1, 2021
@sdsilva10
Copy link
Author

Is there any procedure where I can merge the intermediate files of the sample subsets so as to generate results for the whole sample set?

@brentp
Copy link
Owner

brentp commented Oct 1, 2021

do you mean for peddy? no.

i would just use somalier for 20K at a time. in order to compare all pairwise combinations for your samples, you'd need to do: (187K*187K) 34,969,000,000 pairwise comparisons, and have multiple matrices with that many entries. You might be able to do all samples at once on a machine with 1TB of memory to do all at once with somalier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants