Skip to content
View iofu728's full-sized avatar
😶
Focusing
😶
Focusing

Highlights

  • Pro

Block or report iofu728

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
iofu728/README.md

👨‍🌾‍ This is Huiqiang Jiang (姜慧强)'s homepage.

Research SDE in Microsoft Research Asia (Shanghai),
a fake MLsys/NLPer Google schoal,
Research focus on Efficient Methods (in LLMs)

A unpopular blogger Blog & Zhihu
A programming enthusiast @iofu728

Phone: +86 178 xxxx xxxx
Email: hjiang[aT]microsoft[DoT.]com


Huiqiang Jiang obtained his Master's Degree in Software Engineering from Peking University, working with A.P. Xiang Jing. And also was a research intern at the KC Group, Microsoft Research Asia (19/6-21/3) with Börje Karlsson and Guoxin Wang as well as the search group, Ant Group (20/6-20/8).

Huiqiang's research primarily concentrates on efficient methods to accelerate inference or training, including dynamic sparse attention (MInference, RetrievalAttention), prompt compression (LLMLingua), KV-cache compression, speculative decoding, model compression, sparse inference (PIT), neural architecture search (NAS), and efficient tuning, with a particular emphasis on LLMs. Additionally, he is interested in addressing typical challenges in natural language processing.

He's looking for one research intern in efficient methods. Please get in touch with him (hjiang[aT]microsoft[DoT.]com) if you are interested in the research topics.

image

Pinned Loading

  1. microsoft/LLMLingua microsoft/LLMLingua Public

    [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

    Python 4.7k 261

  2. microsoft/MInference microsoft/MInference Public

    [NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

    Python 800 38

  3. zsh.sh zsh.sh Public

    🤖zsh deploy script by a lazy man

    Shell 63 11

  4. spider spider Public

    🕷some website spider application base on proxy pool (support http & websocket)

    Python 110 36

  5. pkuthss pkuthss Public

    Forked from CasperVector/pkuthss

    A modified version of LaTeX Peking University graduate degree thesis template base on CasperVector/pkuthss

    TeX 73 13

  6. PaperRead PaperRead Public

    📒Record some paper read notes

    20 2