Li Tan's picture

39 1 9

Li Tan

tanliboy

·

https://github.com/tanliboy

AI & ML interests

None yet

Organizations

New activity in rombodawg/Rombos-LLM-V2.5-Qwen-72b about 1 year ago

what is your "continuous finetuning"

#2 opened over 1 year ago by

New activity in google/gemma-2-9b-it about 1 year ago

Batch Inference causes degraded performance

#43 opened over 1 year ago by

New activity in Qwen/Qwen2.5-7B-Instruct over 1 year ago

Scorecard on popular benchmarks

#2 opened over 1 year ago by

New activity in ContextualAI/ultrafeedback_clair_32k over 1 year ago

Phi-2-Instruct-APO: aligned with Anchored Preference Optimization

#3 opened over 1 year ago by

New activity in Qwen/Qwen2.5-Math-RM-72B over 1 year ago

Preference Alignment

#6 opened over 1 year ago by

New activity in meta-llama/Llama-3.1-8B over 1 year ago

Text Classification with LLMs

#30 opened over 1 year ago by

New activity in Alibaba-NLP/gte-Qwen2-1.5B-instruct over 1 year ago

Qwen 2.5 1.5B retrain?

#12 opened over 1 year ago by

New activity in meta-llama/Llama-3.1-8B-Instruct over 1 year ago

GSM8K Evaluation Result: 84.5 vs. 76.95

#81 opened over 1 year ago by

New activity in Qwen/Qwen2-VL-7B-Instruct over 1 year ago

Finetuning script using HuggingFace (No llama-factory)

#32 opened over 1 year ago by

New activity in meta-llama/Llama-3.1-8B-Instruct over 1 year ago

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.

#120 opened over 1 year ago by

New activity in Qwen/Qwen2-VL-7B-Instruct over 1 year ago

Have you deleted your GitHub page?

#10 opened over 1 year ago by

New activity in google/gemma-2-9b-it over 1 year ago

Sliding window vs. Global Attention

#41 opened over 1 year ago by

New activity in google/gemma-2-2b over 1 year ago

Gemma2-2b training uses much more momory!

#23 opened over 1 year ago by

New activity in google/gemma-2b over 1 year ago

GemmaSdpaAttention vs GemmaAttention

#71 opened over 1 year ago by

New activity in meta-llama/Llama-3.1-70B-Instruct over 1 year ago

Fix Llama 3.1 Chat Template to Properly Handle add_generation_prompt

#26 opened over 1 year ago by

New activity in Qwen/Qwen2-VL-7B-Instruct over 1 year ago

🍭 Fine-tuning support for Qwen2-VL-7B-Instruct

#1 opened over 1 year ago by

New activity in meta-llama/Llama-3.1-8B-evals over 1 year ago

How is this dataset supposed to be used to evaluate the model?

#1 opened over 1 year ago by

realdanielbyrne

New activity in google/gemma-2-2b-it over 1 year ago

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

#18 opened over 1 year ago by

New activity in meta-llama/Meta-Llama-3-8B-Instruct over 1 year ago

Llama-3-Instruct with Langchain keeps talking to itself

#147 opened over 1 year ago by

New activity in meta-llama/Llama-3.1-70B-Instruct over 1 year ago

Pruning

#24 opened over 1 year ago by