Hands-on
Search

LLM Compile Process Overview
ååã®èšäºãã極ããŠãã©ã€ããŒããªèªåã ãã®LLMã䟡å€ãããã®ãïŒ[第1å - ãã¡ã€ã³ãã¥ãŒãã³ã°](https://blog.sionic.ai/Finetuning_Llama)ã§ã¯ã倧èŠæš¡ã¢ãã«æ§ç¯ã®é£ãããšç Žå£çãªå¿åŽçŸè±¡ãªã©ã®ä»£æ¿ãšããŠç»å ŽããRetriever Augmented Generation(RAG)æ¹æ³ãèŠãŠã¿ãŸããã RAGã¯LLMã®åŒ·åãªããã¹ãçæèœåãããŒã¹ã«ããŠãã¢ãã«ã«ãŠãŒã¶ãŒã®ã¯ãšãªã«åã£ãææžã®ã¹ãããããé©åã«åã蟌ãã§ããã³ãããéããŠå¿çããæ¹åŒã§ãããŠãŒã¶ãŒãããŒãœãã©ã€ãºãããLLMãæ§ç¯ããç¹å®ã®ç®çã«åãããŠèª¿æŽããæ¹æ³ã¯ãæ§ã
ãªæã§æçšã«äœ¿çšããããšãã§ããŸãã
ä»åã¯MLC-LLMããã±ãŒãžã掻çšããWebGPU Build & Runã¬ã€ããå
±æããŸããããã«ããã倧èŠæš¡èšèªã¢ãã«(LLM)ãWebGPUã掻çšããŠããã«ãïŒå®è¡ããéçšãéããŠãèªåã®ããŒã¿ã§å€§èŠæš¡èšèªã¢ãã«ãæ§ç¯ãå®è¡ããããšãã§ããããã«ãªããŸãã
åææ¡ä»¶
LLMãã«ãã®ããã®èŠä»¶
â¢
python3: æ®éçã«ãã䜿ãããCondaç°å¢ã§ãpythonèšèªã䜿çšããŠé²è¡ããŸãã
â¢
conda: Pythonããã±ãŒãžã®è¡çªãé²ãããã«ç°å¢åé¢ã®æã«å¿
èŠã§ãã
â¢
Git LFS: weight fileãªã©å€§å®¹éãã¡ã€ã«ãpullingããããã«å¿
èŠã§ãã
â¢
TVM Compiler: ãªãŒãã³ãœãŒã¹ãã£ãŒãã©ãŒãã³ã°ã³ã³ãã€ã©
WebAssembly ãã«ãã®ããã®èŠä»¶
â¢
Emscripten: LLVMã䜿ãèšèª(C/C++)ãWebAssemblyã§ã³ã³ãã€ã«ã§ããããã«ããããŒã«ãã§ãŒã³ã
ãã«ã
極ããŠãã©ã€ããŒããªç§ã ãã®LLMãäœããã®ãïŒ[第ïŒåŒŸ- WebGPU Build & Run
å人ã®äœ¿ãæ¹ã«åãããè¶
巚倧èšèªã¢ãã«ã®æŽ»çš
é執顯 ãã ããã¯ãã§ã³ / Head of Development
LLM
compilation
WebGPU
RAG(Retrieval Augmented Generation)ãšãã¡ã€ã³ãã¥ãŒãã³ã°
倧èŠæš¡èšèªã¢ãã«(Large language model, LLM)ã¯ãäžè¬çãªèª²é¡ãããŸãåŠçããå©ç¹ããããŸããç§ãã¡ãChatGPTã«ç±çããçç±ããäžè¬çãªç¥èã«é¢ãã質åãæšè«ã«å¯ŸããŠå¿
èŠãªçããããŸãçæããããã ãšæããŸããããããæ¥åžžç掻ã§ã®å€§èŠæš¡èšèªã¢ãã«ã®æå¹æŽ»çšã«ã¯ãå人ãçµç¹ã¬ãã«ã§ç¹å®ã®ããŒã¿ãåŠç¿ãããããšãäžå¯æ¬ ã§ãã
æ¬çš¿ã§ã¯ãRAGãšãã¡ã€ã³ãã¥ãŒãã³ã°ãšãã2ã€ã®æ¹æ³ãåãäžããŸãããããã¯ããããã倧èŠæš¡èšèªã¢ãã«ãããŒã¹ã«ããŠã«ã¹ã¿ãã€ãºãè¡ãææ³ã§ãããããããã«ã³ã¹ããšæ§èœã®é¢ã§ç°ãªãç¹åŸŽããããŸãã
ãŸããèšèªã¢ãã«ããã©ã€ããŒããªLLMãšããŠäœ¿çšã§ããæ¹æ³ãšããŠããã¡ã€ã³ãã¥ãŒãã³ã°ããããŸããäºååŠç¿ããã倧èŠæš¡èšèªã¢ãã«ã«å°ããªããŒã¿ã»ãããè¿œå ã§åŠç¿ãããç¹å®ã®äœæ¥ã«åãããŠåŸ®èª¿æŽããŠæ§èœãæ¹åããæ¹æ³ã§ããäŒçµ±çã«ããã¡ã€ã³ãã¥ãŒãã³ã°ã¯å·šå€§ãªåäœã®ãŠã§ãããŒã¿ãäºååŠç¿ããå°ããªåéã®èª²é¡ã«å¿ããŠãã¥ãŒãã³ã°ãè¡ãæ¹æ³ã§ããããã¢ãã«ã®ãã©ã¡ãŒã¿æ°ãã©ãã©ã倧ãããªããäŒæ¥ãç 究è
ãã¢ãã«å
šäœããã¡ã€ã³ãã¥ãŒãã³ã°ããããšãé£ãããªãããã¡ã€ã³ãã¥ãŒãã³ã°ããã¢ãã«ã®ä¿åãšã³ã¹ããéåžžã«å€§ãããªããŸããããã®ä»ã«ããæ°ããæ
å ±ãåŠç¿ããéã以åã«åŠç¿ããæ
å ±ãçªç¶æ¥æ¿ã«å¿ããçŸè±¡ãã€ãŸãç Žå£çå¿åŽ(Catastrophic forgetting)ãšåŒã°ããçŸè±¡ã解決ã«å°é£ããããŸããã
ChatGPTãªã©ã®LLMã¢ãã«ãç£æ¥çã«å°é ãå§ããŠããã¡ããã©1幎ãçµã¡ãåäŒæ¥ãèŠã€ããè²»çšå¯Ÿå¹æã®é«ã代æ¿æ段ãRAGãšèšããŸããRAGææ³ã¯ãLLMã®åŒ·åãªããã¹ãçæåãããŒã¹ã«ããŠãŒã¶ãŒã®ã¯ãšãªã«åã£ãå¿
èŠãªææžã¹ãããããé©åã«åãåºããã¢ãã«ã«ããã³ãããæäŸããŠå¿çããæ¹æ³ã§ããLlamaIndexãLangchainã®ãããªéçºè
ããŒã«ãunstructured.ioã®ãããªååŠçSDKããããŠMilvusã®ãããªè€æ°ã®åçšã®ãã¯ã¿ãŒãµãŒãDBãæè¿1幎éã«èªèŽããæè³é¡ãšããªã¥ãšãŒã·ã§ã³ãèŠããšãæ¥çã«ãããRAGã«å¯Ÿããé¢å¿åºŠã¯å®¹æã«æšæž¬ã§ãããšæããŸãã

åçåºå
ž: OpenAI - A Survey of Techniques for Maximizing LLM Performance https://youtu.be/ahnGLM-RC1Y
ããããã¢ãã«ã®ç®çãã®ãã®ãããèªç±ã«å€ãããããšããç¹ã§ããã¡ã€ã³ãã¥ãŒãã³ã°ãæã€é
åãç¡èŠã§ããŸãããã¢ãã«ãå¿çããã¹ã¿ã€ã«ãããŒã³ããããŒããã©ãŒãããã®ãããªè³ªçãªé¢ãå€ããããåžæãã圢ã®ã¢ãŠãããããåºãããšãä¿èšŒããããTextãSQLã«å€ãããªã©ã®ããã³ããã ãã§ã¯èª¬æãã«ããç¹å®ã®ã¿ã¹ã¯ã«ç¹åããå¿
èŠãããå Žåã¯ããã¡ã€ã³ãã¥ãŒãã³ã°ãæå©ãªå ŽåããããŸãã
OpenAIã¯ååã®DevDayã§ãã¡ã€ã³ãã¥ãŒãã³ã°ãšRAGãå¿
èŠãªå Žåã2ã€ã®è»žã§æŽçããŠçŽ¹ä»ããŸãããã¢ãã«ã®ç¥èçãªåŽé¢ãä¿®æ£ãããå Žåã«ã¯RAGããã¢ãã«ãã©ã®ããã«çããŠæšè«ããããä¿®æ£ãããå Žåã«ã¯ãã¡ã€ã³ãã¥ãŒãã³ã°ãé©ããŠãããšçŽ¹ä»ããŸããã
éå»ã«ã¯ç°¡åã§ã¯ãªãã£ãããŒã¹ã¢ãã«ã®ãã¡ã€ã³ãã¥ãŒãã³ã°ãã2ã€ã®åŽé¢ãããäžè¬éçºè
ã«ãšã£ãŠã¢ã¯ã»ã¹ãããã圢ã«ãªã£ãŠããŠãããšæããŸããäžã€ã¯ãåçšåãããã¢ãã«ã®ã¯ã©ãŠããµãŒãã¹ãšããŠã®ããã¡ã€ã³ãã¥ãŒãã³ã°çšã®APIã(https://platform.openai.com/docs/guides/fine-tuning/fine-tuning-examples)ãæäŸããŠããããšãšãããäžã€ã¯ããã¡ã€ã³ãã¥ãŒãã³ã°ã®ããã»ã¹å
šè¬ã®é£æ床ãäžãããå°ãã®ç¥èããããã°ãå
¬éã¢ãã«ãå©çšããŠç¬èªã®ããŒã¿ã»ãããæã£ãŠãã©ã€ããŒããªç°å¢ã§ãã¡ã€ã³ãã¥ãŒãã³ã°ãã§ããããã«ãªã£ãããšã§ãã ä»åã®èšäºã·ãªãŒãºã§ã¯ããã®ããã«ç¬èªã®ããŒã¿ã»ããã§ãã¡ã€ã³ãã¥ãŒãã³ã°ããããã»ã¹ãå®ç¿ããŠã¿ãããšæããŸãã
1.
ãªãŒãã³ãœãŒã¹ããŒã¹ã®å€§èŠæš¡èšèªã¢ãã«ãåºã«ãç¬èªã®ããŒã¿ã»ããã§ãã¡ã€ã³ãã¥ãŒãã³ã°ããã
2.
åœè©²ã¢ãã«ã®ã€ã³ãã¡ã¬ã³ã¹ãWebGPUã掻çšããŠããŒã«ã«ã§è¡ãããšã§ãæ©å¯æ§ã®é«ãæ
å ±ãå€éšã«å
¬éããããšãªããããŒã«ã«ã§èªåã ãã®LLMãé§åã§ããã
ä»åã®èšäºã§ã¯ãMeta AIãå
¬éããLLaMA 7B Chatã¢ãã«ãåºã«ãQLoRAã掻çšããŠèªåã ãã®ããŒã¿ããã¡ã€ã³ãã¥ãŒãã³ã°ããHugging Faceã«é
åžããŠã¿ãããšæããŸãã
極ããŠç§çãªç§ã ãã®LLMãäœããã®ã? [第ïŒåŒŸãŒãã¡ã€ã³ãã¥ãŒãã³ã°]
éæŽåçãéå«ççãªåºåãæ€èšŒã§ããããŒã¿ã»ãã
æŽ ãžã³ãã§ã³(Sigrid Jin) / Software Engineer, Sionic AI
Finetuning
Llama
ãã¡ã€ã³ãã¥ãŒãã³ã°
倧èŠæš¡èšèªã¢ãã«
LLM