abacusai/Smaug-72B-v0.1 · Deploy on Azure 4 x A100 (80 GB) got hang

Hi abacusai team,

We're currently trying to deploy the abacusai/Smaug-72B-v0.1 model in Azure with the following specifications:

GPU: 4 x A100 (80 GB)
CPU: 96 vCPU
Memory: 880 GB
Nvidia CUDA: 12.3
Nvidia Driver Version: v545.23.08
Text Generation Inference: v1.4.2

The docker manifest file:

services:
  text-generation-inference:
    image: ghcr.io/huggingface/text-generation-inference:1.4.2
    container_name: text-generation-inference
    command: >
      --model-id abacusai/Smaug-72B-v0.1
      --max-total-tokens 16384
      --max-input-length 8192
      --num-shard 4
      --sharded true
      --max-top-n-tokens 1
      --max-best-of 1
      --disable-custom-kernels
      --trust-remote-code
      --max-stop-sequences 1
      --validation-workers 1
      --waiting-served-ratio 0
      --max-batch-total-tokens 16384
      --max-waiting-tokens 8192
      --cuda-memory-fraction 0.8
      --max-concurrent-requests 512
      --max-batch-prefill-tokens 16384
      --json-output
    volumes:
      - ./data:/data
    ports:
      - 8080:80
    shm_size: '1gb'
    restart: always
    env_file:
      - .env
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:80/health"]
      interval: 30s
      timeout: 45s
      start_period: 180s
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]

We expected the model to run smoothly with these specs, but we're encountering hanging issues. Could you please provide some recommendations for troubleshooting this problem?