Henk717 commited on
Commit
86d84ef
1 Parent(s): 1c97e32

Overhaul to use precompiled binaries

Browse files

Llamacpp has been adding a lot of CUDA kernels, this causes compile time concerns on the HF spaces.
To ensure everything keeps working smoothly the space is now using our precompiled binary, this ensures maximum compatibility between the different GPU's and also severely reduces the time it takes to build the space.

Files changed (1) hide show
  1. Dockerfile +9 -10
Dockerfile CHANGED
@@ -1,4 +1,4 @@
1
- FROM nvidia/cuda:12.1.1-devel-ubuntu22.04
2
  ARG MODEL
3
  ARG IMGMODEL
4
  ARG WHISPERMODEL
@@ -6,14 +6,13 @@ ARG MMPROJ
6
  ARG MODEL_NAME
7
  ARG ADDITIONAL
8
  RUN mkdir /opt/koboldcpp
9
- RUN apt update && apt install git build-essential libopenblas-dev wget python3-pip -y
10
- RUN git clone https://github.com/lostruins/koboldcpp /opt/koboldcpp
11
  WORKDIR /opt/koboldcpp
12
  COPY default.json /opt/koboldcpp/default.json
13
- RUN make -j$(nproc) LLAMA_OPENBLAS=1 LLAMA_CUBLAS=1 LLAMA_PORTABLE=1 LLAMA_COLAB=1
14
- RUN wget -O model.ggml $MODEL || true
15
- RUN wget -O imgmodel.ggml $IMGMODEL || true
16
- RUN wget -O mmproj.ggml $MMPROJ || true
17
- RUN wget -O whispermodel.ggml $WHISPERMODEL || true
18
- CMD /bin/python3 ./koboldcpp.py --model model.ggml --whispermodel whispermodel.ggml --sdmodel imgmodel.ggml --sdthreads 4 --sdquant --sdclamped --mmproj mmproj.ggml $ADDITIONAL --port 7860 --hordemodelname $MODEL_NAME --hordemaxctx 1 --hordegenlen 1 --preloadstory default.json --ignoremissing
19
-
 
1
+ FROM ubuntu
2
  ARG MODEL
3
  ARG IMGMODEL
4
  ARG WHISPERMODEL
 
6
  ARG MODEL_NAME
7
  ARG ADDITIONAL
8
  RUN mkdir /opt/koboldcpp
9
+ RUN apt update && apt install curl -y
 
10
  WORKDIR /opt/koboldcpp
11
  COPY default.json /opt/koboldcpp/default.json
12
+ RUN curl -fLo koboldcpp https://koboldai.org/cpplinuxcu12
13
+ RUN chmod +x ./koboldcpp
14
+ RUN curl -fLo model.ggml $MODEL || true
15
+ RUN curl -fLo imgmodel.ggml $IMGMODEL || true
16
+ RUN curl -fLo mmproj.ggml $MMPROJ || true
17
+ RUN curl -fLo whispermodel.ggml $WHISPERMODEL || true
18
+ CMD ./koboldcpp --model model.ggml --whispermodel whispermodel.ggml --sdmodel imgmodel.ggml --sdthreads 4 --sdquant --sdclamped --mmproj mmproj.ggml $ADDITIONAL --port 7860 --hordemodelname $MODEL_NAME --hordemaxctx 1 --hordegenlen 1 --preloadstory default.json --ignoremissing