Set Up Pathogen Reference Database

The Set Up Pathogen Reference Database tool adds metadata to individual sequences in a sequence list so microbial isolate samples can be analyzed in the context of this metadata for typing. In general, the tool will be used to associate a user's own metadata to reference genomes included in a pathogen reference genome database (such as the one downloaded by the Download Pathogen Reference Genomes tool).

The tool import pathogen information to an existing database and bundle the new metadata or overwrite the existing metadata reference by reference if this option is selected. This means that references that already have metadata and for which no new metadata is imported will keep the metadata; references with metadata for which new metadata is imported will be updated; references with no metadata will acquire new metadata.

To run the tool, go to:

        Toolbox | Microbial Genomics Module (Image mgm_folder_closed_flat_16_h_p) | Typing and Epidemiology (beta) (Image typing_epi_folder_closed_16_h_p) | Set Up Pathogen Reference Database (Image setup_pathogen_db_16_n_p)

In the first window (Sequence List), you can choose a new sequence list or an existing database (figure 10.3).

Image pathogen2
Figure 10.3: Select the existing database you want to edit.

In the second window called "Select input file and map columns to attributes" (figure 10.4), select an Excel or a csv file saved on your computer.

Image pathogen3
Figure 10.4: Select the file containing the metadata you wish to bundle with your existing database.

There are 2 required columns in the input file: Name and Taxonomy. Names ensure that the link between the sequence and the metadata is successful. Taxonomy must be following the Qiime or common 7-step(s?) formats:

You can add as many columns of metadata as you wish. The names given in the first row will be used as metadata categories. Once imported, the information from the spreadsheet (or csv file) will fill in the table included in the wizard window (see figure 10.4), and the headers will take the names of the first row. It is still possible to edit the first row data at this point, thereby changing the names of the metadata categories. Leaving a first row field blank means that the metadata in that column will not be imported.

The tool will output a single sequence list containing both sequence data and metadata for each sequence initially present in the input sequence list. By selecting the "Overwrite old metadata" option, the references already associated with metadata for which new metadata is imported will be updated.

In the last wizard window, the option to generate a report is selected by default. The report contains the following summary information: