Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync master #15

Merged
merged 132 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
132 commits
Select commit Hold shift + click to select a range
6d16169
ggml : prevent builds with -ffinite-math-only (#7726)
ggerganov Jun 4, 2024
3b38d48
Per token attributes (#7685)
jaime-m-p Jun 4, 2024
b226c12
refine .gitignore (#7688)
zhouwg Jun 4, 2024
987d743
Improve hipBLAS support in CMake (#7696)
daniandtheweb Jun 4, 2024
adc9ff3
llama-bench : allow using a different printer for stderr with -oe (#7…
slaren Jun 4, 2024
5ca0944
readme : remove obsolete Zig instructions (#7471)
ggerganov Jun 4, 2024
0cd6bd3
llama : remove beam search (#7736)
ggerganov Jun 4, 2024
554c247
ggml : remove OpenCL (#7735)
ggerganov Jun 4, 2024
1442677
common : refactor cli arg parsing (#7675)
ggerganov Jun 4, 2024
b90dc56
Allow number of nodes in CUDA graph to change (#7738)
agray3 Jun 4, 2024
c90dbe0
Fix per token atrributes bits (#7749)
jaime-m-p Jun 4, 2024
9973e81
readme : remove -ins (#7759)
arch-btw Jun 5, 2024
2b33896
ggml : refactor rope norm/neox (#7634)
ggerganov Jun 5, 2024
7d1a378
CUDA: refactor mmq, dmmv, mmvq (#7716)
JohannesGaessler Jun 5, 2024
7672ade
Fix encoding in python scripts (#7733)
Galunid Jun 5, 2024
d67caea
docker : add openmp lib (#7780)
slaren Jun 6, 2024
2d08b7f
docker : build only main and server in their images (#7782)
slaren Jun 6, 2024
f5d7b26
llama : add jina v2 base code (#7596)
JoanFM Jun 6, 2024
55b2d08
grammars: x{min,max} repetition operator (#6640)
ochafik Jun 6, 2024
a143c04
README minor fixes (#7798) [no ci]
Chediak Jun 6, 2024
ad675e1
Added support for . (any character) token in grammar engine. (#6467)
HanClinto Jun 6, 2024
f83351f
imatrix : migrate to gpt_params (#7771)
ggerganov Jun 6, 2024
ee459f4
server : fix --threads-http arg (#7801)
ggerganov Jun 6, 2024
c9ee711
check for nans in imatrix and quantize (#7807)
slaren Jun 7, 2024
d5c938c
[SYCL] fix softmax r2r result wrong issue (#7811)
pengxin99 Jun 7, 2024
a5cabd7
server : do not get prompt in infill mode (#7286)
woodx9 Jun 7, 2024
7027b27
server: update cache_prompt documentation [no ci] (#7745)
JohannesGaessler Jun 7, 2024
27615f5
cmake : fix BUILD_SHARED_LIBS=ON build (#7784)
intelmatt Jun 7, 2024
c00fad7
gguf-split : change binary multi-byte units to decimal (#7803)
christianazinn Jun 7, 2024
da799b4
vulkan : reuse parent extra for views (#7806)
slaren Jun 7, 2024
7a16ce7
server : smart slot selection using Longest Common Prefix (#7728)
sasha0552 Jun 8, 2024
d4d915d
url: save -mu downloads to new cache location (#7826)
ochafik Jun 8, 2024
fe1e391
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)"…
slaren Jun 8, 2024
ed9f252
gguf-py : decouple adding metadata from writing in GGUFWriter (#7827)
compilade Jun 9, 2024
5795b94
convert-hf : match model part name prefix and suffix (#7687)
compilade Jun 9, 2024
2decf57
convert-hf : set the model name based on cli arg, if present (#7693)
sasha0552 Jun 9, 2024
42b53d1
CUDA: revise q8_1 data layout for mul_mat_q (#7824)
JohannesGaessler Jun 9, 2024
3e2ee44
server: do not remove whitespace at the start of a completion chunk (…
mgroeber9110 Jun 9, 2024
57bf62c
docs: Added initial PR template with directions for doc only changes …
nicolasperez19 Jun 9, 2024
e95beeb
imatrix : handle partial entries (#7833)
ggerganov Jun 9, 2024
10ceba3
flake.lock: Update (#7838)
ggerganov Jun 9, 2024
af4ae50
use the correct SYCL context for host USM allocations (#7777)
bashbaug Jun 10, 2024
1f0dabd
CUDA: use tensor cores for MMQ (#7676)
JohannesGaessler Jun 10, 2024
d9da0e4
server : improve "prompt" handling (#7847)
ggerganov Jun 10, 2024
c28a839
examples : remove --instruct remnants (#7846)
ggerganov Jun 10, 2024
fd5ea0f
ci : try win-2019 on server windows test (#7854)
slaren Jun 10, 2024
864a99e
cmake : fix CMake requirement for CUDA (#7821)
cebtenzzre Jun 10, 2024
396b18d
`json`: document schema conversion in GBNF readme, align manual gramm…
ochafik Jun 11, 2024
b61eb96
json: refine constraint for whitespace to avoid runaways yet allow pr…
ochafik Jun 11, 2024
c2ce6c4
fix CUDA CI by using a windows-2019 image (#7861)
slaren Jun 11, 2024
bdcb8f4
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860)
JohannesGaessler Jun 11, 2024
4bfe50f
tests : check the Python version (#7872)
ggerganov Jun 11, 2024
148995e
llama-bench: more compact markdown tables (#7879)
JohannesGaessler Jun 11, 2024
6fe42d0
github: move PR template to .github/ root (#7868)
mofosyne Jun 11, 2024
14f8352
fix broken link in pr template (#7880) [no ci]
deven367 Jun 11, 2024
ef52d1d
Update Vulkan RoPE implementation (#7818)
0cc4m Jun 11, 2024
73bac2b
vulkan: select only one device for single gpu with multiple drivers (…
Adriankhl Jun 11, 2024
f2b5764
Fix a typo and add Fedora 40 pacakge to install for Vulkan (#7794) [n…
metal3d Jun 12, 2024
dcf7527
update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (#7894)
airMeng Jun 12, 2024
704a35b
server : restore numeric prompts (#7883)
ggerganov Jun 12, 2024
bfaa676
ggml : improve ggml_is_contiguous logic (#7856)
ggerganov Jun 12, 2024
a9cae48
tests : add non-cont unary tests (#7857)
ggerganov Jun 12, 2024
9635529
CUDA: fix broken oob check for FA vec f32 kernel (#7904)
JohannesGaessler Jun 12, 2024
1c641e6
`build`: rename main → llama-cli, server → llama-server, llava-cli → …
ochafik Jun 12, 2024
f578b86
move BLAS to a separate backend (#6210)
slaren Jun 13, 2024
a55eb1b
readme : Remove outdated instructions from README.md (#7914) [no ci]
Galunid Jun 13, 2024
172c825
rpc : fix ggml_backend_rpc_supports_buft() (#7918)
rgerganov Jun 13, 2024
41b9260
convert : add Poro-34B-chat tokenizer support (#7713)
ezosa Jun 14, 2024
6fcd133
llama : more checks before assuming FIM tokens (#7644)
CISC Jun 14, 2024
e65bbf6
llama-bench : fix RPC indication (#7936)
rgerganov Jun 14, 2024
66ef1ce
metal : utilize max shared memory for mul_mat_id (#7935)
ggerganov Jun 14, 2024
76d66ee
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921)
JohannesGaessler Jun 14, 2024
f8ec887
ci : fix macos x86 build (#7940)
olexiyb Jun 14, 2024
7b2f4a7
[SYCL] remove global variables (#7710)
airMeng Jun 15, 2024
0c7b359
Add `cvector-generator` example (#7514)
ngxson Jun 15, 2024
7c7836d
Vulkan Shader Refactor, Memory Debugging Option (#7947)
0cc4m Jun 16, 2024
c8a8219
github : update pr template
ggerganov Jun 16, 2024
cddaf02
ggml : fix handling of zero blocks in IQ quants (#7955)
ggerganov Jun 16, 2024
6fe1c62
readme : update UI list [no ci] (#7958)
hopkins385 Jun 16, 2024
5239925
unicode : avoid char32_t (#7957)
ggerganov Jun 16, 2024
bc6c457
flake.lock: Update (#7951)
ggerganov Jun 16, 2024
398105f
ggml : remove duplicate include of ggml-common.h (ggml/853)
danbev Jun 16, 2024
b5fcf8e
ggml : fix and optimize ppc64le (ggml/849)
penghongbo Jun 16, 2024
19b7a83
cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231)
ggerganov Jun 11, 2024
43b35e3
Add support for sqrt on CUDA (#7953)
calvin-laurenson Jun 16, 2024
df68d4f
[SYCL] Update README-sycl.md for Chapter "Recommended release" and "N…
arthw Jun 17, 2024
006167a
gguf-dump.py: add --markdown dump output (#7853)
mofosyne Jun 17, 2024
21be9ca
rpc : fix load/store misaligned addresses (#7948)
ggerganov Jun 17, 2024
6a2f0b3
Implement non-mapped async IO for CUDA on Windows. (#7896)
mtavenrath Jun 17, 2024
c637fcd
fix: divide 0 exception in mamba (#7932)
thxCode Jun 17, 2024
99052cd
sched : offload_op also requires supports_op (#7977)
slaren Jun 17, 2024
b473e95
Add Nix and Flox install instructions (#7899)
bryanhonof Jun 17, 2024
7c26775
llama : disable FA if KV head size do not match (#7982)
ggerganov Jun 17, 2024
5b6da18
Make updates to type cast based on compiler instead of OS (#7851)
Srihari-mcw Jun 17, 2024
a94e6ff
update: support Qwen2-57B-A14B (#7835)
legraphista Jun 17, 2024
e6ecc2b
whisper : use ggml_backend_sched (whisper/2239)
ggerganov Jun 18, 2024
5326bcc
ggml : sync
ggerganov Jun 18, 2024
1193778
readme : update UI list (#7943)
abgulati Jun 18, 2024
b96f9af
chore: clean useless beam search param (#7985)
thxCode Jun 18, 2024
6166527
Allow compiling with CUDA without CUDA runtime installed (#7989)
drepper Jun 18, 2024
84f6de1
Fix no gcc pragma on Windows (#7751)
jojorne Jun 18, 2024
91c188d
Only use FIM middle token if it exists (#7648)
CISC Jun 18, 2024
37bef89
tokenizer : BPE fixes (#7530)
jaime-m-p Jun 18, 2024
623494a
[SYCL] refactor (#6408)
airMeng Jun 19, 2024
a04a953
codecov : remove (#8004)
ggerganov Jun 19, 2024
9c77ec1
ggml : synchronize threads using barriers (#7993)
slaren Jun 19, 2024
a785474
un-ignore `build-info.cmake` and `build-info.sh` (#7996)
mdegans Jun 19, 2024
ba58993
server : fix smart slot selection (#8020)
sasha0552 Jun 19, 2024
2075a66
metal : fix `ggml_metal_supports_op` for BF16 (#8021)
mdegans Jun 20, 2024
d50f889
CUDA: stream-k decomposition for MMQ (#8018)
JohannesGaessler Jun 20, 2024
de391e4
[SYCL] Fix windows build and inference (#8003)
luoyu-intel Jun 20, 2024
abd894a
common: fix warning (#8036)
JohannesGaessler Jun 20, 2024
17b291a
convert-hf : Fix the encoding in the convert-hf-to-gguf-update.py (#8…
hamdoudhakem Jun 20, 2024
b1ef562
requirements : Bump torch and numpy for python3.12 (#8041)
hamdoudhakem Jun 20, 2024
0e64591
swiftui : enable stream updating (#7754)
shu223 Jun 21, 2024
80ea089
llama : allow pooled embeddings on any model (#7477)
iamlemec Jun 21, 2024
a927b0f
llama : optimize long word tokenization with WPM (#8034)
ggerganov Jun 21, 2024
7d5e877
ggml : AVX IQ quants (#7845)
netrunnereve Jun 21, 2024
557b653
vulkan: detect multiple devices by deviceUUID instead of deviceID (#8…
Adriankhl Jun 21, 2024
c5a8d4b
JSON Schema to GBNF integration tests (#7790)
HanClinto Jun 22, 2024
5b48cd5
Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 v…
ddh0 Jun 22, 2024
3aa184a
convert-hf : change assert to exception (#8015)
0xspringtime Jun 22, 2024
adf480c
cvector-generator: Moe Moe Fixie-Fixie for Lots of Formats~! ♡(ᐢ ᴥ ᐢ)…
HatsuneMikuUwU33 Jun 22, 2024
3e58b0e
cvector: fix CI + correct help message (#8064)
ngxson Jun 22, 2024
b5a5f34
Removing extra blank lines that were breaking Lint. (#8067)
HanClinto Jun 22, 2024
45c0e2e
Refactor Vulkan backend to allow multiple contexts (#7961)
0cc4m Jun 23, 2024
b6b9a8e
fix CI failures (#8066)
slaren Jun 23, 2024
11318d9
Fix typo in llama_set_embeddings comment (#8077)
danbev Jun 23, 2024
6a2f298
server : fix JSON-Scheme typo (#7975)
akx Jun 23, 2024
e112b61
llama : add support for BitnetForCausalLM (#7931)
Eddie-Wang1120 Jun 23, 2024
95f57bb
ggml : remove ggml_task_type and GGML_PERF (#8017)
slaren Jun 24, 2024
77beb4d
Merge branch 'prepare-PR-of-minicpm-v2.5' into master
tc-mb Jun 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .devops/cloud-v-pipeline
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ node('x86_runner1'){ // Running on x86 runner containing latest vecto
stage('Running llama.cpp'){
sh'''#!/bin/bash
module load gnu-bin2/0.1 # loading latest versions of vector qemu and vector gcc
qemu-riscv64 -L /softwares/gnu-bin2/sysroot -cpu rv64,v=true,vlen=256,elen=64,vext_spec=v1.0 ./main -m /home/alitariq/codellama-7b.Q4_K_M.gguf -p "Anything" -n 9 > llama_log.txt # Running llama.cpp on vector qemu-riscv64
qemu-riscv64 -L /softwares/gnu-bin2/sysroot -cpu rv64,v=true,vlen=256,elen=64,vext_spec=v1.0 ./llama-cli -m /home/alitariq/codellama-7b.Q4_K_M.gguf -p "Anything" -n 9 > llama_log.txt # Running llama.cpp on vector qemu-riscv64
cat llama_log.txt # Printing results
'''
}
Expand Down
2 changes: 1 addition & 1 deletion .devops/full-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ FROM ${BASE_CUDA_DEV_CONTAINER} as build
ARG CUDA_DOCKER_ARCH=all

RUN apt-get update && \
apt-get install -y build-essential python3 python3-pip git libcurl4-openssl-dev
apt-get install -y build-essential python3 python3-pip git libcurl4-openssl-dev libgomp1

COPY requirements.txt requirements.txt
COPY requirements requirements
Expand Down
2 changes: 1 addition & 1 deletion .devops/full.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ ARG UBUNTU_VERSION=22.04
FROM ubuntu:$UBUNTU_VERSION as build

RUN apt-get update && \
apt-get install -y build-essential python3 python3-pip git libcurl4-openssl-dev
apt-get install -y build-essential python3 python3-pip git libcurl4-openssl-dev libgomp1

COPY requirements.txt requirements.txt
COPY requirements requirements
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,13 @@ ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable CUDA
ENV LLAMA_CUDA=1

RUN make -j$(nproc)
RUN make -j$(nproc) llama-cli

FROM ${BASE_CUDA_RUN_CONTAINER} as runtime

COPY --from=build /app/main /main
RUN apt-get update && \
apt-get install -y libgomp1

COPY --from=build /app/llama-cli /llama-cli

ENTRYPOINT [ "/main" ]
ENTRYPOINT [ "/llama-cli" ]
26 changes: 26 additions & 0 deletions .devops/llama-cli-intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
ARG ONEAPI_VERSION=2024.1.1-devel-ubuntu22.04

FROM intel/oneapi-basekit:$ONEAPI_VERSION as build

ARG LLAMA_SYCL_F16=OFF
RUN apt-get update && \
apt-get install -y git

WORKDIR /app

COPY . .

RUN if [ "${LLAMA_SYCL_F16}" = "ON" ]; then \
echo "LLAMA_SYCL_F16 is set" && \
export OPT_SYCL_F16="-DLLAMA_SYCL_F16=ON"; \
fi && \
cmake -B build -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx ${OPT_SYCL_F16} && \
cmake --build build --config Release --target llama-cli

FROM intel/oneapi-basekit:$ONEAPI_VERSION as runtime

COPY --from=build /app/build/bin/llama-cli /llama-cli

ENV LC_ALL=C.utf8

ENTRYPOINT [ "/llama-cli" ]
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,6 @@ ENV LLAMA_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
ENV CXX=/opt/rocm/llvm/bin/clang++

RUN make -j$(nproc)
RUN make -j$(nproc) llama-cli

ENTRYPOINT [ "/app/main" ]
ENTRYPOINT [ "/app/llama-cli" ]
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ ARG UBUNTU_VERSION=jammy
FROM ubuntu:$UBUNTU_VERSION as build

# Install build tools
RUN apt update && apt install -y git build-essential cmake wget
RUN apt update && apt install -y git build-essential cmake wget libgomp1

# Install Vulkan SDK
RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
Expand All @@ -15,13 +15,13 @@ RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key
WORKDIR /app
COPY . .
RUN cmake -B build -DLLAMA_VULKAN=1 && \
cmake --build build --config Release --target main
cmake --build build --config Release --target llama-cli

# Clean up
WORKDIR /
RUN cp /app/build/bin/main /main && \
RUN cp /app/build/bin/llama-cli /llama-cli && \
rm -rf /app

ENV LC_ALL=C.utf8

ENTRYPOINT [ "/main" ]
ENTRYPOINT [ "/llama-cli" ]
9 changes: 6 additions & 3 deletions .devops/main.Dockerfile → .devops/llama-cli.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,15 @@ WORKDIR /app

COPY . .

RUN make -j$(nproc)
RUN make -j$(nproc) llama-cli

FROM ubuntu:$UBUNTU_VERSION as runtime

COPY --from=build /app/main /main
RUN apt-get update && \
apt-get install -y libgomp1

COPY --from=build /app/llama-cli /llama-cli

ENV LC_ALL=C.utf8

ENTRYPOINT [ "/main" ]
ENTRYPOINT [ "/llama-cli" ]
14 changes: 7 additions & 7 deletions .devops/llama-cpp-clblast.srpm.spec
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,9 @@ make -j LLAMA_CLBLAST=1

%install
mkdir -p %{buildroot}%{_bindir}/
cp -p main %{buildroot}%{_bindir}/llamaclblast
cp -p server %{buildroot}%{_bindir}/llamaclblastserver
cp -p simple %{buildroot}%{_bindir}/llamaclblastsimple
cp -p llama-cli %{buildroot}%{_bindir}/llama-clblast-cli
cp -p llama-server %{buildroot}%{_bindir}/llama-clblast-server
cp -p llama-simple %{buildroot}%{_bindir}/llama-clblast-simple

mkdir -p %{buildroot}/usr/lib/systemd/system
%{__cat} <<EOF > %{buildroot}/usr/lib/systemd/system/llamaclblast.service
Expand All @@ -49,7 +49,7 @@ After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.t
[Service]
Type=simple
EnvironmentFile=/etc/sysconfig/llama
ExecStart=/usr/bin/llamaclblastserver $LLAMA_ARGS
ExecStart=/usr/bin/llama-clblast-server $LLAMA_ARGS
ExecReload=/bin/kill -s HUP $MAINPID
Restart=never

Expand All @@ -67,9 +67,9 @@ rm -rf %{buildroot}
rm -rf %{_builddir}/*

%files
%{_bindir}/llamaclblast
%{_bindir}/llamaclblastserver
%{_bindir}/llamaclblastsimple
%{_bindir}/llama-clblast-cli
%{_bindir}/llama-clblast-server
%{_bindir}/llama-clblast-simple
/usr/lib/systemd/system/llamaclblast.service
%config /etc/sysconfig/llama

Expand Down
14 changes: 7 additions & 7 deletions .devops/llama-cpp-cuda.srpm.spec
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,9 @@ make -j LLAMA_CUDA=1

%install
mkdir -p %{buildroot}%{_bindir}/
cp -p main %{buildroot}%{_bindir}/llamacppcuda
cp -p server %{buildroot}%{_bindir}/llamacppcudaserver
cp -p simple %{buildroot}%{_bindir}/llamacppcudasimple
cp -p llama-cli %{buildroot}%{_bindir}/llama-cuda-cli
cp -p llama-server %{buildroot}%{_bindir}/llama-cuda-server
cp -p llama-simple %{buildroot}%{_bindir}/llama-cuda-simple

mkdir -p %{buildroot}/usr/lib/systemd/system
%{__cat} <<EOF > %{buildroot}/usr/lib/systemd/system/llamacuda.service
Expand All @@ -49,7 +49,7 @@ After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.t
[Service]
Type=simple
EnvironmentFile=/etc/sysconfig/llama
ExecStart=/usr/bin/llamacppcudaserver $LLAMA_ARGS
ExecStart=/usr/bin/llama-cuda-server $LLAMA_ARGS
ExecReload=/bin/kill -s HUP $MAINPID
Restart=never

Expand All @@ -67,9 +67,9 @@ rm -rf %{buildroot}
rm -rf %{_builddir}/*

%files
%{_bindir}/llamacppcuda
%{_bindir}/llamacppcudaserver
%{_bindir}/llamacppcudasimple
%{_bindir}/llama-cuda-cli
%{_bindir}/llama-cuda-server
%{_bindir}/llama-cuda-simple
/usr/lib/systemd/system/llamacuda.service
%config /etc/sysconfig/llama

Expand Down
14 changes: 7 additions & 7 deletions .devops/llama-cpp.srpm.spec
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,9 @@ make -j

%install
mkdir -p %{buildroot}%{_bindir}/
cp -p main %{buildroot}%{_bindir}/llama
cp -p server %{buildroot}%{_bindir}/llamaserver
cp -p simple %{buildroot}%{_bindir}/llamasimple
cp -p llama-cli %{buildroot}%{_bindir}/llama-cli
cp -p llama-server %{buildroot}%{_bindir}/llama-server
cp -p llama-simple %{buildroot}%{_bindir}/llama-simple

mkdir -p %{buildroot}/usr/lib/systemd/system
%{__cat} <<EOF > %{buildroot}/usr/lib/systemd/system/llama.service
Expand All @@ -51,7 +51,7 @@ After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.t
[Service]
Type=simple
EnvironmentFile=/etc/sysconfig/llama
ExecStart=/usr/bin/llamaserver $LLAMA_ARGS
ExecStart=/usr/bin/llama-server $LLAMA_ARGS
ExecReload=/bin/kill -s HUP $MAINPID
Restart=never

Expand All @@ -69,9 +69,9 @@ rm -rf %{buildroot}
rm -rf %{_builddir}/*

%files
%{_bindir}/llama
%{_bindir}/llamaserver
%{_bindir}/llamasimple
%{_bindir}/llama-cli
%{_bindir}/llama-server
%{_bindir}/llama-simple
/usr/lib/systemd/system/llama.service
%config /etc/sysconfig/llama

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@ ENV LLAMA_CUDA=1
# Enable cURL
ENV LLAMA_CURL=1

RUN make -j$(nproc)
RUN make -j$(nproc) llama-server

FROM ${BASE_CUDA_RUN_CONTAINER} as runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev
apt-get install -y libcurl4-openssl-dev libgomp1

COPY --from=build /app/server /server
COPY --from=build /app/llama-server /llama-server

ENTRYPOINT [ "/server" ]
ENTRYPOINT [ "/llama-server" ]
29 changes: 29 additions & 0 deletions .devops/llama-server-intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
ARG ONEAPI_VERSION=2024.1.1-devel-ubuntu22.04

FROM intel/oneapi-basekit:$ONEAPI_VERSION as build

ARG LLAMA_SYCL_F16=OFF
RUN apt-get update && \
apt-get install -y git libcurl4-openssl-dev

WORKDIR /app

COPY . .

RUN if [ "${LLAMA_SYCL_F16}" = "ON" ]; then \
echo "LLAMA_SYCL_F16 is set" && \
export OPT_SYCL_F16="-DLLAMA_SYCL_F16=ON"; \
fi && \
cmake -B build -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_CURL=ON ${OPT_SYCL_F16} && \
cmake --build build --config Release --target llama-server

FROM intel/oneapi-basekit:$ONEAPI_VERSION as runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev

COPY --from=build /app/build/bin/llama-server /llama-server

ENV LC_ALL=C.utf8

ENTRYPOINT [ "/llama-server" ]
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,6 @@ ENV LLAMA_CURL=1
RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev

RUN make -j$(nproc)
RUN make -j$(nproc) llama-server

ENTRYPOINT [ "/app/server" ]
ENTRYPOINT [ "/app/llama-server" ]
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@ RUN apt-get update && \
WORKDIR /app
COPY . .
RUN cmake -B build -DLLAMA_VULKAN=1 -DLLAMA_CURL=1 && \
cmake --build build --config Release --target server
cmake --build build --config Release --target llama-server

# Clean up
WORKDIR /
RUN cp /app/build/bin/server /server && \
RUN cp /app/build/bin/llama-server /llama-server && \
rm -rf /app

ENV LC_ALL=C.utf8

ENTRYPOINT [ "/server" ]
ENTRYPOINT [ "/llama-server" ]
8 changes: 4 additions & 4 deletions .devops/server.Dockerfile → .devops/llama-server.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@ COPY . .

ENV LLAMA_CURL=1

RUN make -j$(nproc)
RUN make -j$(nproc) llama-server

FROM ubuntu:$UBUNTU_VERSION as runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev
apt-get install -y libcurl4-openssl-dev libgomp1

COPY --from=build /app/server /server
COPY --from=build /app/llama-server /llama-server

ENV LC_ALL=C.utf8

ENTRYPOINT [ "/server" ]
ENTRYPOINT [ "/llama-server" ]
34 changes: 0 additions & 34 deletions .devops/main-intel.Dockerfile

This file was deleted.

6 changes: 3 additions & 3 deletions .devops/nix/apps.nix
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@
let
inherit (config.packages) default;
binaries = [
"llama"
"llama-cli"
"llama-embedding"
"llama-server"
"quantize"
"train-text-from-scratch"
"llama-quantize"
"llama-train-text-from-scratch"
];
mkApp = name: {
type = "app";
Expand Down
4 changes: 1 addition & 3 deletions .devops/nix/package.nix
Original file line number Diff line number Diff line change
Expand Up @@ -243,8 +243,6 @@ effectiveStdenv.mkDerivation (
# TODO(SomeoneSerge): It's better to add proper install targets at the CMake level,
# if they haven't been added yet.
postInstall = ''
mv $out/bin/main${executableSuffix} $out/bin/llama${executableSuffix}
mv $out/bin/server${executableSuffix} $out/bin/llama-server${executableSuffix}
mkdir -p $out/include
cp $src/llama.h $out/include/
'';
Expand Down Expand Up @@ -294,7 +292,7 @@ effectiveStdenv.mkDerivation (
license = lib.licenses.mit;

# Accommodates `nix run` and `lib.getExe`
mainProgram = "llama";
mainProgram = "llama-cli";

# These people might respond, on the best effort basis, if you ping them
# in case of Nix-specific regressions or for reviewing Nix-specific PRs.
Expand Down
Loading
Loading