Training
Once you have preprocessed the dataset, you can train the model.
Small Model
Here is an example to train the model with the default parameters which results in the small Bloodhound model.
LAYERS=6
bloodhound-tools train \
--memmap preprocessed/esm${LAYERS}.npy \
--memmap-index preprocessed/esm${LAYERS}.txt \
--seqtree preprocessed/esm${LAYERS}.st \
--max-learning-rate 0.0002 \
--max-epochs 70 \
--train-all \
--embedding-model ESM${LAYERS} \
--run-name "Bloodhound-ESM${LAYERS}-small"
This will create a model in the logs/Bloodhound-ESM6-small
directory. The checkpoint with the weights will be saved in the directory called:
logs/Bloodhound-ESM6-small/version_0/checkpoints/
. Use the smaller checkpoint with the weights
prefix.
The larger checkpoint with the checkpoint
prefix includes optimizer state and you can delete this file once the training is finished.
If you want to use Weights and Biases for logging, you can add the --wandb
option to the command.
Large Model
If you want to train the large Bloodhound model, you can use the following command:
LAYERS=6
bloodhound-tools train \
--memmap preprocessed/esm${LAYERS}.npy \
--memmap-index preprocessed/esm${LAYERS}.txt \
--seqtree preprocessed/esm${LAYERS}.st \
--features 1536 \
--max-learning-rate 0.0002 \
--max-epochs 70 \
--train-all \
--embedding-model ESM${LAYERS} \
--run-name "Bloodhound-ESM${LAYERS}-large"
Advanced Training
See more options for training with the command:
bloodhound-tools train --help