Paper Review
Search
å°å ¥
â¢
LLMã®æšè«èœåãåäžãããããã®æ§ã
ãªæ¹æ³ã«ã€ããŠã®ç ç©¶ãæŽ»çºã«è¡ãããŠããã
(以åã®èšäºåç
§ïŒããã³ããã«ããLLMã®æšè«èœååäž)
â¢
ãã®èšäºã§ã¯ãã¢ã«ãŽãªãºã åé¡ã ãã§ãªããæ§ã
ãªèªç¶èšèªã¿ã¹ã¯ã§LLMãæã€åªããã³ãŒãçæèœåãé©åã«æŽ»çšã§ããæ¹æ³è«ãæç€ºãããchain of codeããšããè«æã玹ä»ããã
â¢

æŠèŠ
â¢
äžè¡èŠçŽ : LLMã«code-driven reasoningããããã广ãè¯ãã
â¢
æ°åŠçèšç®ã ãã§ãªããsemantic reasoningãããªããã°ãªããªãåé¡ã§ããLLMã«(pseudo) codeãçæãããåŸã«é©åãªãšãã¥ã¬ãŒã¿ã§å®è¡ããçµæãåŸãŠãããããæŽ»çšããã°ãããè¯ãæ§èœãéæã§ããããšã瀺ãããã(äŸãã°ãdetect_sarcasm(input))
â¢
â¢
ç¹ã«CoTã¯äžå®ãµã€ãºä»¥äžã®LMã§ã®ã¿å¹æããã£ãã®ã«å¯ŸããCoCã¯å°ããªLMã§ã广ããã£ãã
æ¹æ³è«ã®è©³çް
â¢
代衚çãªæ¢åã®æ¹æ³è«ãšã®æ¯èŒ
â¢
Scratchpad : æšè«éçšãcode圢åŒã§çæãLLMãcode interpreterã®åœ¹å²ãå®è¡(codeã®å®è¡ãLLMãèªãå®è¡)
â¢
ã¢ã€ãã¢ã¯æ¯èŒçã·ã³ãã«ãCode generationãšcode executionã®æ®µéã§æ§æã
CodeãéããŠLLMã®æšè«èœåãé«ããããšãã§ããã®ãïŒ
ã³ãŒããéãã巚倧èšèªã¢ãã«ã®æšè«èœååäž
æŽå®æ ãã¯ããŠãã§ã³ / CDO & Head of Research
Prompting
Tuning
ãã®èšäºã®ååã§ã¯Super-NaturalInstructions(SuperNI)è«æãæŠèŠ³ããåŸãåŸåã§ã¯SuperNIã«å«ãŸããããŒã¿ã»ããã®ãã¡ãéåœèªã§ãã£ãããè峿·±ãããŒããå«ãã§ããããŒã¿ã»ããã玹ä»ããŸãã

è«æçŽ¹ä»
æŠèŠ
â¢
SuperNIã¯Allen Institute for AIãUniversity of WashingtonãArizona State Universityãã¯ãããšããåèš21æ©é¢æå±ã®ç ç©¶è
ãåå ãã1600äœãã®NLP instructionããŒã¿ãäœæããå
¬éãããããžã§ã¯ãã§ãã
â¢
https://arxiv.org/abs/2104.08773ã§61åã®ã¿ã¹ã¯ã«é¢ããããŒã¿ãå
¬éããããšããã¹ã¿ãŒãã
â¢
åèš88人ã®ã³ã³ããªãã¥ãŒã¿ãŒãæ¢åã®å
¬éãããNLPããŒã¿ã掻çšããã¯ã©ãŠããœãŒã·ã³ã°ãããªã©ã®æ¹æ³ã§äœæ¥
â¢
Tk-InstructïŒè±èªïŒåã³mTk-InstructïŒå€èšèªïŒã¢ãã«éçº
æ¹æ³è«ã®è©³çް
â¢
ããŒã¿æ§é
â¢
SuperNIããŒã¿ã»ããã®ç°¡åãªçµ±èšåæ
[è«æã¬ãã¥ãŒ]Super-NaturalInstructions
Super-NaturalãInstructions(SuperNI)è«æã»ããŒã¿ã»ããã®ç޹ä»
æŽå®æ / CDO & Head of Research, 宿°žæ· / ML Researcher
Instruction
LLM
dataset
Super-NaturalInstructions
å°å ¥
â¢
LLMã¯åçŽãªããã³ããã ãã§å€ãã®èª²é¡ã§åªããèœåãçºæ®ããããå®ç§ã§ã¯ãªãã
â¢
ãã®äžã§ã代衚çãªåé¡ãšããŠã¯ãäºå®ã§ãªãå
容ãäºå®ã§ãããã®ããã«çæãããã«ã·ããŒã·ã§ã³åé¡ããããŠç€ŸäŒçã«åé¡ã®äœå°ãããå±éºãªçºèšãçæããåé¡ãªã©ãããã
â¢
ãã®èšäºã§ã¯ãbiasãååšããããŸãã¯åé¡ãšãªãå
容ãLLMãèªã倿ããæå¶ããããšã«é¢ããè«æã«ã€ããŠç޹ä»ããã
â¢
åèãŸã§ã«ããã®ãããªLLMã®ãself-correctionããããã¯ãself-refinementãã®åé¡ã«ã€ããŠãã£ãšè©³ããç¥ãããå Žåã¯ããã®surveyè«æ(Pan et al. (2023)ããã³é¢é£referenceãåè
â¢
ã¬ãã¥ãŒè«æ

æŠèŠ
â¢
LLMãçæããæç« ããŠãŒã¶ãŒãæãããã«'align'ãããããã«ãæ¢åã®å€ãã®ç ç©¶ã§ã¯preference datasetãæ§ç¯ããreward modelãåŠç¿ããåŸããã®ã¹ã³ã¢ã«åºã¥ããŠLLMãRL(e.g., PPO)ã§ãã¥ãŒãã³ã°ããæ¹æ³ãå€ã䜿çšã
â¢
å®éã®OpenAIã®ã¢ãã« (InstructGPTãChatGPTãGPT-4ãªã©) ãã¯ãããGoogleãMetaãAnthropicãªã©ã»ãŒå
šãŠã®ãšããã§ãã®æ¹æ³ã§ãã¥ãŒãã³ã°ãããŠLLMãéçºããã
â¢
ããããreward modelãåŠç¿ããããã®ããŒã¿ã»ããå¶äœã¯éåžžã«æéãšè²»çšãããããæ§ç¯é£æåºŠãé«ããéçºãé£ããã
â¢
ããã§ã¯ãæç€ºç㪠reward model ãªãã§ zero-shot/few-shot prompting ãéããŠå¹æçã«harmlessnessãé«ãã (ã€ãŸããæå®³ãªã³ã³ãã³ãçæãæå¶ãã) çµæã瀺ããŠããã
LLMã¯èªãåçã®å±éºæ§ã倿ã§ããã®ãïŒ
è¶
巚倧èšèªã¢ãã«ã®åçã®å±éºæ§ã倿ããã«é¢é£ããè«æã®ç޹ä»
æŽå®æ / CDO & Head of Research
Prompting
Alignment
LLM
å°å ¥
â¢
Promptingã¯äººéãå€§èŠæš¡èšèªã¢ãã«(LLM)ãå¶åŸ¡ããã³ãã¥ãã±ãŒã·ã§ã³ããææ®µã§ããã
â¢
ãŠãŒã¶ãŒã¯ã欲ããçµæãåŸãããã«ãã©ãããã°ããŸãPromptingãäœæã§ããããšããäžè¬çãªæ¹æ³è«ã«å¯ŸããããŒãºã¯ä»åŸãå¢ãããšæãããã
â¢
æè¿ãçæã ãã§ãªããèªç¶èšèªçè§£ïŒæã®åé¡ãã·ãŒã±ã³ã¹ã©ãã«ä»ãã質çå¿çïŒèª²é¡ã§ããã³ãããã¥ãŒãã³ã°ããã¡ã€ã³ãã¥ãŒãã³ã°ãããæ§èœãè¯ããªã£ããšããã¬ããŒã(Lifu Tu et al. (2022)ãCOT(Jason Wei et al. (2022)ãªã©ã®ããã³ããæ¹æ³è«ããããŠãã«ãã¢ãŒãã«ã§ã®å¿çš(Andy Zeng et al. (2022)ãªã©ãçºè¡šããå§ããŠããã
â¢
ãã®èšäºã§ã¯ãããã³ããã£ã³ã°ãéããŠzero-shotã®ããã©ãŒãã³ã¹ãåäžãããè峿·±ã2ã€ã®è«æã玹ä»ããã

â¢
ã¬ãã¥ãŒè«æ
æŠèŠ
â¢
COSP : Consistency-based Self-adaptive Prompting
â¢
USP : Universal Self-adaptive Prompting
â¢
Unlabeled dataãšblack-box LLMãéããŠzero-shot in-context learning(ICL)ã®æ§èœãåäžãããããšãç®çãšããç°ãªã2ã€ã®æ¹æ³è«
广çãªããã³ããã£ã³ã°(Prompting)æ¹æ³è«ã®ç޹ä»
广çãªããã³ããã£ã³ã°(Prompting)æ¹æ³è«ã®ç޹ä»
æŽå®æ / CDO & Head of Research
Prompting
Tuning