Naming Files

This document describes best practices for naming files.

I highly recommend you read this Data Carpentary document on file naming.

Be descriptive

Good filenames should be precise. For example, a raw FASTQ filename received from AGRF may look something like this:

X54321-1_AB1XXXXXX_GAATTCGT-ATAGAGGC_L007_R2.fastq.gz

The filename contains a lot of information which is descriptive to the user and can be easily extracted programatically.

Just by looking at the filename and knowing what each field represents, we have the following information:

Storing metadata in this fashion allows us to avoid mixing up samples and losing expensive sequencing data due to lost information.

Be consistent

If you’re storing metadata in filenames, it’s good practice to be consistent with what each field represents. Being consisent also means filenames are sorted in a logical manner when listed alphabetically.

Good consistency:

2017-02-07_cg_meeting_notes_with_alice.md
2017-02-14_cg_meeting_notes_with_bob.md
2017-02-21_cg_meeting_notes_with_alice.md
2017-02-28_cg_meeting_notes_with_carol.md

Bad consistency:

14.02.17_bob_cg_meeting.md
21-02-alice-meeting-cg.md
carol_cg_meeting_notes_2017-02-28.md
cg_meeting_with_alice_2017-02-07.md

Allowed characters

Avoid special characters. The only characters you should be using in your filenames are the alphanumeric characters (A-Z, a-z, 0-9), underscores (_), periods (.), and dashes (-). Always start your filename with an alphanumeric character.

Avoid using spaces

Using spaces in filenames makes things difficult when working with UNIX. Therefore, use underscores _ and dashes - to separate words.

For example:

sample-data-from-bob.txt
2017-03-03_raw_data_ID_A721BW.tar.gz

You can also use periods . for additional metadata information.

sample-1.AaegL3.bam
sample-1.AaegL3.sorted.bam
sample-1.AaegL3.sorted.filtered.bam

YYYY-MM-DD

The correct way to write numeric dates is YYYY-MM-DD, (i.e. ISO 8601).

Good names:

2017-08-01_results/
2016-04-22_meeting_notes.txt
2018-11-29_raw-data-from-john-doe.tar.gz

Lowercase and uppercase

Using purely lowecase letters, or a mixture of lower and uppercase letters in filenames are both ok. It’s a matter of personal preference.

Avoid having files that differ only by case in the same directory. e.g.

notes.txt
Notes.txt