ahassoun's picture
Upload 3018 files
ee6e328
|
raw
history blame
5.29 kB

๐Ÿค— Accelerate๋ฅผ ํ™œ์šฉํ•œ ๋ถ„์‚ฐ ํ•™์Šต[[distributed-training-with-accelerate]]

๋ชจ๋ธ์ด ์ปค์ง€๋ฉด์„œ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋Š” ์ œํ•œ๋œ ํ•˜๋“œ์›จ์–ด์—์„œ ๋” ํฐ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๊ณ  ํ›ˆ๋ จ ์†๋„๋ฅผ ๋ช‡ ๋ฐฐ๋กœ ๊ฐ€์†ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ์ „๋žต์œผ๋กœ ๋“ฑ์žฅํ–ˆ์Šต๋‹ˆ๋‹ค. Hugging Face์—์„œ๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ํ•˜๋‚˜์˜ ๋จธ์‹ ์— ์—ฌ๋Ÿฌ ๊ฐœ์˜ GPU๋ฅผ ์‚ฌ์šฉํ•˜๋“  ์—ฌ๋Ÿฌ ๋จธ์‹ ์— ์—ฌ๋Ÿฌ ๊ฐœ์˜ GPU๋ฅผ ์‚ฌ์šฉํ•˜๋“  ๋ชจ๋“  ์œ ํ˜•์˜ ๋ถ„์‚ฐ ์„ค์ •์—์„œ ๐Ÿค— Transformers ๋ชจ๋ธ์„ ์‰ฝ๊ฒŒ ํ›ˆ๋ จํ•  ์ˆ˜ ์žˆ๋„๋ก ๋•๊ธฐ ์œ„ํ•ด ๐Ÿค— Accelerate ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” ๋ถ„์‚ฐ ํ™˜๊ฒฝ์—์„œ ํ›ˆ๋ จํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ธฐ๋ณธ PyTorch ํ›ˆ๋ จ ๋ฃจํ”„๋ฅผ ์ปค์Šคํ„ฐ๋งˆ์ด์ฆˆํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ด…์‹œ๋‹ค.

์„ค์ •[[setup]]

๐Ÿค— Accelerate ์„ค์น˜ ์‹œ์ž‘ํ•˜๊ธฐ:

pip install accelerate

๊ทธ ๋‹ค์Œ, [~accelerate.Accelerator] ๊ฐ์ฒด๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. [~accelerate.Accelerator]๋Š” ์ž๋™์œผ๋กœ ๋ถ„์‚ฐ ์„ค์ • ์œ ํ˜•์„ ๊ฐ์ง€ํ•˜๊ณ  ํ›ˆ๋ จ์— ํ•„์š”ํ•œ ๋ชจ๋“  ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค. ์žฅ์น˜์— ๋ชจ๋ธ์„ ๋ช…์‹œ์ ์œผ๋กœ ๋ฐฐ์น˜ํ•  ํ•„์š”๋Š” ์—†์Šต๋‹ˆ๋‹ค.

>>> from accelerate import Accelerator

>>> accelerator = Accelerator()

๊ฐ€์†ํ™”๋ฅผ ์œ„ํ•œ ์ค€๋น„[[prepare-to-accelerate]]

๋‹ค์Œ ๋‹จ๊ณ„๋Š” ๊ด€๋ จ๋œ ๋ชจ๋“  ํ›ˆ๋ จ ๊ฐ์ฒด๋ฅผ [~accelerate.Accelerator.prepare] ๋ฉ”์†Œ๋“œ์— ์ „๋‹ฌํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—๋Š” ํ›ˆ๋ จ ๋ฐ ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ๋กœ๋”, ๋ชจ๋ธ ๋ฐ ์˜ตํ‹ฐ๋งˆ์ด์ €๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค:

>>> train_dataloader, eval_dataloader, model, optimizer = accelerator.prepare(
...     train_dataloader, eval_dataloader, model, optimizer
... )

๋ฐฑ์›Œ๋“œ(Backward)[[backward]]

๋งˆ์ง€๋ง‰์œผ๋กœ ํ›ˆ๋ จ ๋ฃจํ”„์˜ ์ผ๋ฐ˜์ ์ธ loss.backward()๋ฅผ ๐Ÿค— Accelerate์˜ [~accelerate.Accelerator.backward] ๋ฉ”์†Œ๋“œ๋กœ ๋Œ€์ฒดํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค:

>>> for epoch in range(num_epochs):
...     for batch in train_dataloader:
...         outputs = model(**batch)
...         loss = outputs.loss
...         accelerator.backward(loss)

...         optimizer.step()
...         lr_scheduler.step()
...         optimizer.zero_grad()
...         progress_bar.update(1)

๋‹ค์Œ ์ฝ”๋“œ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, ํ›ˆ๋ จ ๋ฃจํ”„์— ์ฝ”๋“œ ๋„ค ์ค„๋งŒ ์ถ”๊ฐ€ํ•˜๋ฉด ๋ถ„์‚ฐ ํ•™์Šต์„ ํ™œ์„ฑํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!

+ from accelerate import Accelerator
  from transformers import AdamW, AutoModelForSequenceClassification, get_scheduler

+ accelerator = Accelerator()

  model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
  optimizer = AdamW(model.parameters(), lr=3e-5)

- device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
- model.to(device)

+ train_dataloader, eval_dataloader, model, optimizer = accelerator.prepare(
+     train_dataloader, eval_dataloader, model, optimizer
+ )

  num_epochs = 3
  num_training_steps = num_epochs * len(train_dataloader)
  lr_scheduler = get_scheduler(
      "linear",
      optimizer=optimizer,
      num_warmup_steps=0,
      num_training_steps=num_training_steps
  )

  progress_bar = tqdm(range(num_training_steps))

  model.train()
  for epoch in range(num_epochs):
      for batch in train_dataloader:
-         batch = {k: v.to(device) for k, v in batch.items()}
          outputs = model(**batch)
          loss = outputs.loss
-         loss.backward()
+         accelerator.backward(loss)

          optimizer.step()
          lr_scheduler.step()
          optimizer.zero_grad()
          progress_bar.update(1)

ํ•™์Šต[[train]]

๊ด€๋ จ ์ฝ”๋“œ๋ฅผ ์ถ”๊ฐ€ํ•œ ํ›„์—๋Š” ์Šคํฌ๋ฆฝํŠธ๋‚˜ Colaboratory์™€ ๊ฐ™์€ ๋…ธํŠธ๋ถ์—์„œ ํ›ˆ๋ จ์„ ์‹œ์ž‘ํ•˜์„ธ์š”.

์Šคํฌ๋ฆฝํŠธ๋กœ ํ•™์Šตํ•˜๊ธฐ[[train-with-a-script]]

์Šคํฌ๋ฆฝํŠธ์—์„œ ํ›ˆ๋ จ์„ ์‹คํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ, ๋‹ค์Œ ๋ช…๋ น์„ ์‹คํ–‰ํ•˜์—ฌ ๊ตฌ์„ฑ ํŒŒ์ผ์„ ์ƒ์„ฑํ•˜๊ณ  ์ €์žฅํ•ฉ๋‹ˆ๋‹ค:

accelerate config

Then launch your training with:

accelerate launch train.py

๋…ธํŠธ๋ถ์œผ๋กœ ํ•™์Šตํ•˜๊ธฐ[[train-with-a-notebook]]

Collaboratory์˜ TPU๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋Š” ๊ฒฝ์šฐ, ๋…ธํŠธ๋ถ์—์„œ๋„ ๐Ÿค— Accelerate๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ›ˆ๋ จ์„ ๋‹ด๋‹นํ•˜๋Š” ๋ชจ๋“  ์ฝ”๋“œ๋ฅผ ํ•จ์ˆ˜๋กœ ๊ฐ์‹ธ์„œ [~accelerate.notebook_launcher]์— ์ „๋‹ฌํ•˜์„ธ์š”:

>>> from accelerate import notebook_launcher

>>> notebook_launcher(training_function)

๐Ÿค— Accelerate ๋ฐ ๋‹ค์–‘ํ•œ ๊ธฐ๋Šฅ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ documentation๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.