Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastq naming pattern hard coded? #16

Open
flashton2003 opened this issue May 2, 2023 · 4 comments
Open

fastq naming pattern hard coded? #16

flashton2003 opened this issue May 2, 2023 · 4 comments

Comments

@flashton2003
Copy link

Hello,

Thanks for developing this tool for the community. I'm running BiG-MAP.map on some data and I had a problem getting the fastq sample names from the sample sheet to match up with the fastq paths given on the command line.

I eventually solved this problem by looking at the source code and seeing that you were getting the name from the fastq file name by splitting on underscores and taking the zero-eth item. Unfortunately my fastqs have underscores in the name, like this 32580_8#24_R2.fastq.gz. I could get BiG-MAP.map to run by re-naming my fastqs to be like 32580-8#24_R2.fastq.gz, but perhaps some other users might not dig in this far?

I know it's a nightmare dealing with all the different ways that people write the sample name in their fastq file names, perhaps you could include the path to the fastqs as an entry in the sample sheet?

Best,

Phil

@Lachlan1991
Copy link

Hi Phil,
I'm running into what seems to be a very similar issue, but I wasn't able to solve the error by manually renaming my fastq files.
the error states that the metadata files do not match those of the fastq files.
I've tried shortening (what were very long) file names, keeping the _R1.fastq suffix, etc.
Am I missing something?

@flashton2003
Copy link
Author

flashton2003 commented Oct 25, 2023

Hi Lachlan,

It's a bit fiddly. This is what I've got working. Command line:

python3 ~/programs/BiG-MAP/src/BiG-MAP.map.py -b /data/fast/core/strataa_microbiome/big-map/2023.05.01/2023.05.01.big_map_input.tsv -I1 32972-3#1_R1.fastq.gz 32972-3#2_R1.fastq.gz 32972-3#3_R1.fastq.gz 32972-3#4_R1.fastq.gz 32972-3#5_R1.fastq.gz 32972-3#6_R1.fastq.gz 32972-3#7_R1.fastq.gz 32972-3#8_R1.fastq.gz 32972-3#9_R1.fastq.gz 32972-4#16_R1.fastq.gz 32972-4#17_R1.fastq.gz 32972-4#18_R1.fastq.gz 32972-4#19_R1.fastq.gz 32972-4#20_R1.fastq.gz 32972-4#21_R1.fastq.gz 32972-4#22_R1.fastq.gz 32972-4#23_R1.fastq.gz 32972-5#1_R1.fastq.gz 32972-5#10_R1.fastq.gz 32972-5#11_R1.fastq.gz 32972-5#12_R1.fastq.gz 32972-5#13_R1.fastq.gz 32972-5#14_R1.fastq.gz 32972-5#15_R1.fastq.gz 32972-5#16_R1.fastq.gz 32972-5#17_R1.fastq.gz 32972-5#18_R1.fastq.gz 32972-5#19_R1.fastq.gz 32972-5#2_R1.fastq.gz 32972-5#20_R1.fastq.gz 32972-5#21_R1.fastq.gz 32972-5#22_R1.fastq.gz 32972-5#23_R1.fastq.gz 32972-5#3_R1.fastq.gz 32972-5#4_R1.fastq.gz 32972-5#5_R1.fastq.gz 32972-5#6_R1.fastq.gz 32972-5#7_R1.fastq.gz 32972-5#8_R1.fastq.gz 32972-5#9_R1.fastq.gz 37273-1#1_R1.fastq.gz 37273-1#10_R1.fastq.gz 37273-1#11_R1.fastq.gz 37273-1#12_R1.fastq.gz 37273-1#13_R1.fastq.gz 37273-1#14_R1.fastq.gz 37273-1#15_R1.fastq.gz 37273-1#16_R1.fastq.gz 37273-1#17_R1.fastq.gz 37273-1#18_R1.fastq.gz 37273-1#19_R1.fastq.gz 37273-1#2_R1.fastq.gz 37273-1#20_R1.fastq.gz 37273-1#3_R1.fastq.gz 37273-1#4_R1.fastq.gz 37273-1#5_R1.fastq.gz 37273-1#6_R1.fastq.gz 37273-1#7_R1.fastq.gz 37273-1#8_R1.fastq.gz 37273-1#9_R1.fastq.gz 37273-2#1_R1.fastq.gz 37273-2#10_R1.fastq.gz 37273-2#11_R1.fastq.gz 37273-2#12_R1.fastq.gz 37273-2#13_R1.fastq.gz 37273-2#14_R1.fastq.gz 37273-2#15_R1.fastq.gz 37273-2#16_R1.fastq.gz 37273-2#17_R1.fastq.gz 37273-2#18_R1.fastq.gz 37273-2#19_R1.fastq.gz 37273-2#2_R1.fastq.gz 37273-2#20_R1.fastq.gz 37273-2#3_R1.fastq.gz 37273-2#4_R1.fastq.gz 37273-2#5_R1.fastq.gz 37273-2#6_R1.fastq.gz 37273-2#7_R1.fastq.gz 37273-2#8_R1.fastq.gz 37273-2#9_R1.fastq.gz 32580-6#20_R1.fastq.gz 32580-6#21_R1.fastq.gz 32580-6#22_R1.fastq.gz 32580-6#23_R1.fastq.gz 32580-6#24_R1.fastq.gz 32580-7#14_R1.fastq.gz 32580-7#15_R1.fastq.gz 32580-7#2_R1.fastq.gz 32580-7#21_R1.fastq.gz 32580-7#4_R1.fastq.gz 32580-7#5_R1.fastq.gz 32580-7#6_R1.fastq.gz 32580-8#1_R1.fastq.gz 32580-8#13_R1.fastq.gz 32580-8#23_R1.fastq.gz 32580-8#8_R1.fastq.gz 32712-2#11_R1.fastq.gz 32712-2#24_R1.fastq.gz 32712-2#5_R1.fastq.gz 32712-3#13_R1.fastq.gz 32712-3#17_R1.fastq.gz 32712-3#22_R1.fastq.gz 32712-3#6_R1.fastq.gz 32712-3#9_R1.fastq.gz 32712-5#11_R1.fastq.gz 32712-5#15_R1.fastq.gz 32712-5#16_R1.fastq.gz 32712-5#2_R1.fastq.gz 32712-5#3_R1.fastq.gz 32712-5#7_R1.fastq.gz 32580-8#24_R1.fastq.gz 32712-2#12_R1.fastq.gz 32712-2#13_R1.fastq.gz 32712-2#14_R1.fastq.gz 32712-2#15_R1.fastq.gz 32712-2#17_R1.fastq.gz 32712-2#21_R1.fastq.gz 32712-2#22_R1.fastq.gz 32712-2#23_R1.fastq.gz 32712-2#6_R1.fastq.gz 32712-3#11_R1.fastq.gz 32712-3#12_R1.fastq.gz 32712-3#14_R1.fastq.gz 32712-3#15_R1.fastq.gz 32712-3#16_R1.fastq.gz 32712-3#18_R1.fastq.gz 32712-3#19_R1.fastq.gz 32712-3#20_R1.fastq.gz 32712-3#24_R1.fastq.gz 32712-3#4_R1.fastq.gz 32712-3#5_R1.fastq.gz 32712-3#7_R1.fastq.gz 32712-3#8_R1.fastq.gz 32712-5#1_R1.fastq.gz 32712-5#12_R1.fastq.gz 32712-5#4_R1.fastq.gz 32712-5#6_R1.fastq.gz 32712-5#8_R1.fastq.gz 32712-5#9_R1.fastq.gz -I2 32972-3#1_R2.fastq.gz 32972-3#2_R2.fastq.gz 32972-3#3_R2.fastq.gz 32972-3#4_R2.fastq.gz 32972-3#5_R2.fastq.gz 32972-3#6_R2.fastq.gz 32972-3#7_R2.fastq.gz 32972-3#8_R2.fastq.gz 32972-3#9_R2.fastq.gz 32972-4#16_R2.fastq.gz 32972-4#17_R2.fastq.gz 32972-4#18_R2.fastq.gz 32972-4#19_R2.fastq.gz 32972-4#20_R2.fastq.gz 32972-4#21_R2.fastq.gz 32972-4#22_R2.fastq.gz 32972-4#23_R2.fastq.gz 32972-5#1_R2.fastq.gz 32972-5#10_R2.fastq.gz 32972-5#11_R2.fastq.gz 32972-5#12_R2.fastq.gz 32972-5#13_R2.fastq.gz 32972-5#14_R2.fastq.gz 32972-5#15_R2.fastq.gz 32972-5#16_R2.fastq.gz 32972-5#17_R2.fastq.gz 32972-5#18_R2.fastq.gz 32972-5#19_R2.fastq.gz 32972-5#2_R2.fastq.gz 32972-5#20_R2.fastq.gz 32972-5#21_R2.fastq.gz 32972-5#22_R2.fastq.gz 32972-5#23_R2.fastq.gz 32972-5#3_R2.fastq.gz 32972-5#4_R2.fastq.gz 32972-5#5_R2.fastq.gz 32972-5#6_R2.fastq.gz 32972-5#7_R2.fastq.gz 32972-5#8_R2.fastq.gz 32972-5#9_R2.fastq.gz 37273-1#1_R2.fastq.gz 37273-1#10_R2.fastq.gz 37273-1#11_R2.fastq.gz 37273-1#12_R2.fastq.gz 37273-1#13_R2.fastq.gz 37273-1#14_R2.fastq.gz 37273-1#15_R2.fastq.gz 37273-1#16_R2.fastq.gz 37273-1#17_R2.fastq.gz 37273-1#18_R2.fastq.gz 37273-1#19_R2.fastq.gz 37273-1#2_R2.fastq.gz 37273-1#20_R2.fastq.gz 37273-1#3_R2.fastq.gz 37273-1#4_R2.fastq.gz 37273-1#5_R2.fastq.gz 37273-1#6_R2.fastq.gz 37273-1#7_R2.fastq.gz 37273-1#8_R2.fastq.gz 37273-1#9_R2.fastq.gz 37273-2#1_R2.fastq.gz 37273-2#10_R2.fastq.gz 37273-2#11_R2.fastq.gz 37273-2#12_R2.fastq.gz 37273-2#13_R2.fastq.gz 37273-2#14_R2.fastq.gz 37273-2#15_R2.fastq.gz 37273-2#16_R2.fastq.gz 37273-2#17_R2.fastq.gz 37273-2#18_R2.fastq.gz 37273-2#19_R2.fastq.gz 37273-2#2_R2.fastq.gz 37273-2#20_R2.fastq.gz 37273-2#3_R2.fastq.gz 37273-2#4_R2.fastq.gz 37273-2#5_R2.fastq.gz 37273-2#6_R2.fastq.gz 37273-2#7_R2.fastq.gz 37273-2#8_R2.fastq.gz 37273-2#9_R2.fastq.gz 32580-6#20_R2.fastq.gz 32580-6#21_R2.fastq.gz 32580-6#22_R2.fastq.gz 32580-6#23_R2.fastq.gz 32580-6#24_R2.fastq.gz 32580-7#14_R2.fastq.gz 32580-7#15_R2.fastq.gz 32580-7#2_R2.fastq.gz 32580-7#21_R2.fastq.gz 32580-7#4_R2.fastq.gz 32580-7#5_R2.fastq.gz 32580-7#6_R2.fastq.gz 32580-8#1_R2.fastq.gz 32580-8#13_R2.fastq.gz 32580-8#23_R2.fastq.gz 32580-8#8_R2.fastq.gz 32712-2#11_R2.fastq.gz 32712-2#24_R2.fastq.gz 32712-2#5_R2.fastq.gz 32712-3#13_R2.fastq.gz 32712-3#17_R2.fastq.gz 32712-3#22_R2.fastq.gz 32712-3#6_R2.fastq.gz 32712-3#9_R2.fastq.gz 32712-5#11_R2.fastq.gz 32712-5#15_R2.fastq.gz 32712-5#16_R2.fastq.gz 32712-5#2_R2.fastq.gz 32712-5#3_R2.fastq.gz 32712-5#7_R2.fastq.gz 32580-8#24_R2.fastq.gz 32712-2#12_R2.fastq.gz 32712-2#13_R2.fastq.gz 32712-2#14_R2.fastq.gz 32712-2#15_R2.fastq.gz 32712-2#17_R2.fastq.gz 32712-2#21_R2.fastq.gz 32712-2#22_R2.fastq.gz 32712-2#23_R2.fastq.gz 32712-2#6_R2.fastq.gz 32712-3#11_R2.fastq.gz 32712-3#12_R2.fastq.gz 32712-3#14_R2.fastq.gz 32712-3#15_R2.fastq.gz 32712-3#16_R2.fastq.gz 32712-3#18_R2.fastq.gz 32712-3#19_R2.fastq.gz 32712-3#20_R2.fastq.gz 32712-3#24_R2.fastq.gz 32712-3#4_R2.fastq.gz 32712-3#5_R2.fastq.gz 32712-3#7_R2.fastq.gz 32712-3#8_R2.fastq.gz 32712-5#1_R2.fastq.gz 32712-5#12_R2.fastq.gz 32712-5#4_R2.fastq.gz 32712-5#6_R2.fastq.gz 32712-5#8_R2.fastq.gz 32712-5#9.fastq.gz -O /data/fast/core/strataa_microbiome/big-map/database/2023.05.01 -P /data/fast/core/strataa_microbiome/big-map/database/2023.05.01/BiG-MAP_mg.pickle

metadata file:

32580-6#18	4227STDY7528926	METAGENOMICS	Acute_Typhi
32580-7#1	4227STDY7529032	METAGENOMICS	Acute_Typhi
32580-7#10	4227STDY7528895	METAGENOMICS	Acute_Typhi
32580-7#3	4227STDY7529053	METAGENOMICS	Acute_Typhi
32580-8#18	4227STDY7528960	METAGENOMICS	Control_HealthySerosurvey
32712-2#16	4227STDY7528990	METAGENOMICS	Acute_Typhi
32712-2#18	4227STDY7528994	METAGENOMICS	Control_HealthySerosurvey
32712-2#19	4227STDY7528996	METAGENOMICS	Control_HealthySerosurvey
32712-2#20	4227STDY7529001	METAGENOMICS	Control_HealthySerosurvey
32712-3#10	4227STDY7529025	METAGENOMICS	Control_HealthySerosurvey
32712-3#23	4227STDY7529049	METAGENOMICS	Control_HealthySerosurvey
32712-3#3	4227STDY7529009	METAGENOMICS	Control_HealthySerosurvey
32712-5#10	4227STDY7529067	METAGENOMICS	Control_HealthySerosurvey
32712-5#13	4227STDY7529074	METAGENOMICS	Control_HealthySerosurvey
32712-5#14	4227STDY7529075	METAGENOMICS	Control_HealthySerosurvey
32712-5#5	4227STDY7529058	METAGENOMICS	Control_HealthySerosurvey
32580-6#24	4227STDY7529029	METAGENOMICS	Acute_Typhi
32712-5#4	4227STDY7529057	METAGENOMICS	Control_HealthySerosurvey

And then running it from within the directory with your fastqs in.

@Lachlan1991
Copy link

Thanks Dr Flashington the 2003rd,
your info was super helpful. I needed to remove the .fastq suffix from metadata file, which is a pretty rookie error, but then again I am a rookie!

@flashton2003
Copy link
Author

Great, glad it helped!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants