Skip to content
@EmbeddedLLM

EmbeddedLLM

EmbeddedLLM is the creator behind JamAI Base, a platform designed to orchestrate AI with spreadsheet-like simplicity.

Pinned Loading

  1. JamAIBase JamAIBase Public

    The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate…

    Python 504 17

  2. vllm vllm Public

    Forked from vllm-project/vllm

    vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 89 5

  3. embeddedllm embeddedllm Public

    EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

    Python 22

Repositories

Showing 10 of 41 repositories
  • LMCache-ROCm Public Forked from LMCache/LMCache

    ROCm support of Ultra-Fast and Cheaper Long-Context LLM Inference

    EmbeddedLLM/LMCache-ROCm’s past year of commit activity
    Python 0 Apache-2.0 30 0 0 Updated Dec 3, 2024
  • Star-Attention Public Forked from NVIDIA/Star-Attention

    Efficient LLM Inference over Long Sequences

    EmbeddedLLM/Star-Attention’s past year of commit activity
    Python 0 Apache-2.0 9 0 0 Updated Nov 29, 2024
  • lmcache-vllm Public Forked from LMCache/lmcache-vllm

    The driver for LMCache core to run in vLLM

    EmbeddedLLM/lmcache-vllm’s past year of commit activity
    Python 0 Apache-2.0 12 0 0 Updated Nov 29, 2024
  • JamAIBase Public

    The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.

    EmbeddedLLM/JamAIBase’s past year of commit activity
    Python 504 Apache-2.0 17 1 0 Updated Nov 29, 2024
  • vllm Public Forked from vllm-project/vllm

    vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs

    EmbeddedLLM/vllm’s past year of commit activity
    Python 89 Apache-2.0 4,836 2 0 Updated Nov 26, 2024
  • LLM_Sizing_Guide Public Forked from qoofyk/LLM_Sizing_Guide

    A calculator to estimate the memory footprint, capacity, and latency on NVIDIA AMD Intel

    EmbeddedLLM/LLM_Sizing_Guide’s past year of commit activity
    Python 0 2 0 0 Updated Nov 24, 2024
  • EmbeddedLLM/lmcache-tests’s past year of commit activity
    Python 0 5 0 0 Updated Nov 22, 2024
  • infinity-executable Public Forked from michaelfeil/infinity

    Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.

    EmbeddedLLM/infinity-executable’s past year of commit activity
    Python 0 MIT 119 0 0 Updated Nov 22, 2024
  • SageAttention-rocm Public Forked from thu-ml/SageAttention

    ROCm Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

    EmbeddedLLM/SageAttention-rocm’s past year of commit activity
    Cuda 0 Apache-2.0 28 0 0 Updated Nov 21, 2024
  • torchac_rocm Public Forked from LMCache/torchac_cuda

    ROCm Implementation of torchac_cuda from LMCache

    EmbeddedLLM/torchac_rocm’s past year of commit activity
    Cuda 0 1 0 0 Updated Nov 17, 2024

Top languages

Loading…

Most used topics

Loading…