Note that, in total, running the experiments in the paper will cost around $250 in compute credits. You'll get $300 of compute credits when you make a GCP account.
First, create a bucket called
pretrain-on-test-accuracies
.
Also consider:
-
Adding an alert to notify you if any errors pop up during a cloud run.
-
Increasing your quotas for
GPUS_ALL_REGIONS
andNVIDIA_T4_GPUS
to 1 (orq > 1
for running experiments in parallel) andSSD_TOTAL_GB
to250 * q
. The default region isus-west4
. Or make quota requests according to error messages. FYI I couldn't get my quota past 4 GPUs. -
Adding a secret,
HF_TOKEN
, which has your Hugging Face login token if you need to use Mistral or other models which require this authorization before downloading weights. Then give your service account permission to access this secret.
Consider locally testing that cloud logging and storage works
Run a mini experiment on your computer and check that data was uploaded to GCP.
-
Install the
gcp
requirements (at the repo root):python -m pip install ".[gcp]"
-
From the repo root, run the mini CPU test (after ensuring your
gcloud
is set to whatever project hosts the bucket):PRETRAIN_ON_TEST_CLOUD_PROVIDER="gcp" \ PRETRAIN_ON_TEST_BUCKET_NAME="pretrain-on-test-accuracies" \ ./experiment_mini.sh
-
Check that stuff was logged (search for the latest log group with the name
run-
) and that data was uploaded to the bucketpretrain-on-test-accuracies
.
Test that cloud launches work
Launch a cloud instance which will run a mini experiment, and check that data was uploaded to GCP.
-
Run the mini CPU test (after ensuring your
gcloud
is set to whatever project hosts the bucket):python launch.py --run_type cpu-test
-
Check that stuff was logged (search for the latest log group with the name
run-
) and that data was uploaded to the bucketpretrain-on-test-accuracies
. -
Consider deleting these logs:
python delete_old_test_logs.py
Launch a cloud GPU instance which will run the full experiment, and check that data was uploaded to GCP. Note that the instance will stop even if there's an error.
-
Run the full experiment (after ensuring your
gcloud
is set to whatever project hosts the bucket):python launch.py
To run multiple experiments in parallel / on multiple instances, put some bash files in a directory (e.g.,
./experiments/m100/n500/bert/
) and run:python launch.py --sh_dir_or_filename experiments/m100/n500/bert/
If you're getting an error with code
ZONE_RESOURCE_POOL_EXHAUSTED
(b/c there aren't any T4 GPUs available in the requested zone), then consider adding the flag--any_zone
to thelaunch.py
command. This flag causes the script to automatically try to find a zone with availability. -
Check that stuff was logged (search for the latest log group with the name
run-
) and that data was uploaded to the bucketpretrain-on-test-accuracies
.
After running all of the experiments, merge their data into a single directory which can be used for analysis.
-
cd to:
cd ../../analysis/dirty_file_processing
-
Copy data from GCP storage to
runs
, for example:mkdir -p runs
cd runs
gsutil -m cp -r \ "gs://pretrain-on-test-accuracies/run-2024-06-17_06-52-44-m50_n100_gpt2_4" \ "gs://pretrain-on-test-accuracies/run-2024-06-17_06-52-52-m50_n100_gpt2_2" \ "gs://pretrain-on-test-accuracies/run-2024-06-17_06-52-52-m50_n100_gpt2_5" \ "gs://pretrain-on-test-accuracies/run-2024-06-17_06-53-09-m50_n100_gpt2_7" \ "gs://pretrain-on-test-accuracies/run-2024-06-17_14-23-58-m50_n100_gpt2_6" \ "gs://pretrain-on-test-accuracies/run-2024-06-17_14-24-05-m50_n100_gpt2_3" \ "gs://pretrain-on-test-accuracies/run-2024-06-17_14-24-48-m50_n100_gpt2_1" \ "gs://pretrain-on-test-accuracies/run-2024-06-17_19-59-41-m50_n100_bert_2" \ "gs://pretrain-on-test-accuracies/run-2024-06-17_20-18-05-m50_n100_bert_4" \ "gs://pretrain-on-test-accuracies/run-2024-06-17_20-19-25-m50_n100_bert_6" \ "gs://pretrain-on-test-accuracies/run-2024-06-17_20-21-02-m50_n100_bert_5" \ "gs://pretrain-on-test-accuracies/run-2024-06-17_21-46-23-m50_n100_bert_7" \ "gs://pretrain-on-test-accuracies/run-2024-06-18_00-25-45-m50_n100_bert_1" \ "gs://pretrain-on-test-accuracies/run-2024-06-18_03-00-59-m50_n100_bert_3" \ .
cd ..
-
Merge them into a new directory,
accuracies
:python merge_runs.py --runs_dir runs --destination_dir accuracies
-
Verify that the datasets are the same:
diff <(ls accuracies/m50/n100/bert) <(ls accuracies/m50/n100/gpt2)
-
When you're ready to analyze this data, copy-paste (or move, whatever you prefer)
accuracies
into the analysis dir:cp -a accuracies ../
All of the analyses can be run locally, but I was hitting performance issues for
Launch a high-memory, 4-core CPU instance which will run the analyses, e.g., in
./analyses/m100
:
python launch.py \
--run_type analysis \
--sh_dir_or_filename analyses/m100