READMEFILE 
created 12.Jan.2023 
by Tanja Krueger

# Overview

The fasta_cleaning folder contains 9 python scripts to clean up fasta files

the files are: 
02_analysis_animal_toxins/Code/python/fasta_cleaning/exotoxinDB.py
02_analysis_animal_toxins/Code/python/fasta_cleaning/extractionFromTaxDF.py
02_analysis_animal_toxins/Code/python/fasta_cleaning/filterFastaSeq.py
02_analysis_animal_toxins/Code/python/fasta_cleaning/findFiles.py
02_analysis_animal_toxins/Code/python/fasta_cleaning/multiFastaClean.py
02_analysis_animal_toxins/Code/python/fasta_cleaning/myprint.py
02_analysis_animal_toxins/Code/python/fasta_cleaning/navigatingDir.py
02_analysis_animal_toxins/Code/python/fasta_cleaning/pandasPlayground.py
02_analysis_animal_toxins/Code/python/fasta_cleaning/README
02_analysis_animal_toxins/Code/python/fasta_cleaning/verifyingFasta.py

The main file ist extoxinDB.py. All other files are subprograms that are 
called within exotoxinDB.py


# Purpose of each file
## extoxinDB.py is the main file. All other scripts are subprograms that 
are called from within exotoxinDB.py
This script takes raw data as tagged fasta files and performs all analysis of the sequences such as transformation
of fasta files in a unified form. 

## extractionFromTaxDF.py
This file uses an input of species and

##filterFastaSeq.py
This script opens a fasta file and only keeps the sequences that are part of an identifier list. 

## findFiles.py
Finds files within a folder with a certain partial name

## verifyingFasta.py
checks if input into muliFastaRead is valid. 

## muliFastaClean.py
This file translates the fasta files from diffrent protein sources such 
as UniProt and NCPI by removeing preliminary patters such as >sp 
(that are unique for database origings but compromise the unique species 
accession number) and by unifiying the species names. 

## myprint.py
contains mulitple printing options to generate diffrent file formats. 
File formats the file provides are for example fasta file , certain dictionary 
options.. 
