把大象装冰箱总共要几步？英伟达发布ProgPrompt，让语言模型给机器人安排计划

 收藏

关键词：发布机器人

资讯来源：新智元

发布时间： 2022-10-10

新智元报道

编辑：LRS

【新智元导读】一条命令，让机器人帮你把大象装冰箱里！

对于机器人来说，任务规划（Task Planning）是一个绕不过去的难题。

预测为一个可执行的函数，即microwave_salmon()"],[20,"\n","24:\"i5UG\"|direction:\"ltr\""],[20,"\n","24:\"IXrK\"|direction:\"ltr\""],[20,"在微波炉三文鱼这个任务中，LLM可以生成的且合理的第一步是取出三文鱼，但负责执行计划的智能体可能没有这样一个动作原语。"],[20,"\n","24:\"vVX1\"|direction:\"ltr\""],[20,"\n","24:\"chZ3\"|direction:\"ltr\""],[20,"为了让语言模型了解智能体的动作原语，将其在prompt中通过import语句导入，也就将输出限制为在当前环境下可用的函数上。"],[20,"\n","24:\"2JYV\"|direction:\"ltr\""],[20,"\n","24:\"IYh3\"|direction:\"ltr\""],[20,"要改变智能体的行为空间，只需要更新import的函数列表即可。"],[20,"\n","24:\"Qu7W\"|direction:\"ltr\""],[20,"\n","24:\"q6hu\"|direction:\"ltr\""],[20,"变量objects以一个字符串列表的形式提供了环境中的所有可用物体。"],[20,"\n","24:\"uVU1\"|direction:\"ltr\""],[20,"\n","24:\"ba4L\"|direction:\"ltr\""],[20,"prompt还包括一些完全可执行的程序计划作为示例，每个示例任务都演示了如何使用给定环境中的可用动作和目标来完成一个给定的任务，如throw_away_lime"],[20,"\n","24:\"9KY5\"|direction:\"ltr\""],[20,"\n","24:\"HwGU\"|direction:\"ltr\""],[20,"任务计划的生成和执行"],[20,"\n","24:\"X1z9\"|direction:\"ltr\"|list-id:\"zBMP\"|list-start:\"3\"|ordered:\"decimal\""],[20,"\n","24:\"PZA3\"|direction:\"ltr\""],[20,"给定任务之后，计划完全是由语言模型根据ProgPrompt提示推断出来的，然后可以将生成的计划在虚拟智能体或物理机器人系统上执行，需要用到一个解释器，针对环境执行每个行动命令。"],[20,"\n","24:\"VWlE\"|direction:\"ltr\""],[20,"\n","24:\"U6FD\"|direction:\"ltr\""],[20,"在执行过程中，断言检查以闭环的方式进行，并根据当前环境状态提供反馈。"],[20,"\n","24:\"Re2P\"|direction:\"ltr\""],[20,"\n","24:\"5krs\"|direction:\"ltr\""],[20,"在实验部分，研究人员在虚拟家庭(VH)仿真平台评估了该方法。"],[20,"\n","24:\"8LYG\"|direction:\"ltr\""],[20,"\n","24:\"lH4c\"|direction:\"ltr\""],[20,"VH的状态包括一组物体和相应的属性，比如三文鱼在微波炉内部（in），或者靠近（agent_close_to）等。"],[20,"\n","24:\"jq28\"|direction:\"ltr\""],[20,"\n","24:\"DXhr\"|direction:\"ltr\""],[20,"行动空间包括抓取（grab）、放入（putin）、放回（putback）、行走（walk），寻找（find）、打开（open）、关闭（close）等。"],[20,"\n","24:\"jDlG\"|direction:\"ltr\""],[20,"\n","24:\"rueB\"|direction:\"ltr\""],[20,"最终实验了3个VH环境，每个环境包括115种不同的物体，研究人员创建了一个包含70个家务任务的数据集，抽象程度很高，命令都是「微波三文鱼」这类的，并为之创建一个ground-truth的行动序列。"],[20,"\n","24:\"OCm0\"|direction:\"ltr\""],[20,"\n","24:\"qei7\"|direction:\"ltr\""],[20,"在虚拟家庭上对生成的程序进行评估后，评估指标包括成功率（SR），目标条件召回（GCR）和可执行性（Exec），从结果上可以看到ProgPrompt明显优于基线和LangPrompt，表格中还展示了每个特征是如何提升性能的。"],[20,"\n","24:\"My5W\"|direction:\"ltr\""],[20,"\n","24:\"xiAV\"|direction:\"ltr\""],[20,{"gallery":"https://uploader.shimo.im/f/xVGFlEBOBXcGqGNw.png!thumbnail"},"29:0|30:0|3:\"1343\"|4:\"auto\"|crop:\"\"|frame:\"none\"|ori-height:\"361\"|ori-width:\"1343\""],[20,"\n","24:\"WlpF\"|direction:\"ltr\""],[20,"\n","24:\"y4KI\"|direction:\"ltr\""],[20,"研究人员同样在真实世界进行了实验，使用一个带有平行爪子的Franka-Emika熊猫机器人，并假设可以获得一个拾取和放置（pick-and-place）的策略。"],[20,"\n","24:\"su75\"|direction:\"ltr\""],[20,"\n","24:\"UkiO\"|direction:\"ltr\""],[20,"该策略将目标物体和目标容器的两个点云作为输入，并执行拾取和放置操作，将物体放在容器上或里面。"],[20,"\n","24:\"6Muj\"|direction:\"ltr\""],[20,"\n","24:\"z9Rz\"|direction:\"ltr\""],[20,"系统实现引入一个开放词汇的物体检测模型ViLD来识别和分割场景中的物体，并构建prompt中的可用物体列表。"],[20,"\n","24:\"MmJ2\"|direction:\"ltr\""],[20,"\n","24:\"eR8U\"|direction:\"ltr\""],[20,"与在虚拟环境中不同的是，这里物体列表是每个计划函数的局部变量，这样可以更灵活地适应新对象。"],[20,"\n","24:\"OrXx\"|direction:\"ltr\""],[20,"\n","24:\"gTrq\"|direction:\"ltr\""],[20,"语言模型输出的计划中包含形式为grab和putin等函数调用。"],[20,"\n","24:\"8eb8\"|direction:\"ltr\""],[20,"\n","24:\"Q0PO\"|direction:\"ltr\""],[20,"由于现实世界的不确定性，实验设置中没有实施基于断言的闭环选项。"],[20,"\n","24:\"Sl4D\"|direction:\"ltr\""],[20,"\n","24:\"ukt6\"|direction:\"ltr\""],[20,{"gallery":"https://uploader.shimo.im/f/Iu5mr4xKscBQfnZA.png!thumbnail"},"29:0|30:0|3:\"1044\"|4:\"auto\"|crop:\"\"|frame:\"none\"|ori-height:\"178\"|ori-width:\"1044\""],[20,"\n","24:\"gPfr\"|direction:\"ltr\""],[20,"\n","24:\"2RxG\"|direction:\"ltr\""],[20,"可以看到，机器人在分类任务中，能够识别出香蕉和草莓是水果，并生成计划步骤，将它们放在盘子里，而将瓶子放在盒子里。"]]" data-copy-origin="https://shimo.im">

想要完成一个真实世界的任务，首先你得知道把大象装冰箱总共要几步。

即便是比较简单的扔苹果任务也包含多个子步骤，机器人得先观察苹果的位置，如果没有看到苹果就要持续寻找，然后靠近苹果，把苹果抓起来，找到并靠近垃圾桶。

如果垃圾桶是关着的，还得先把它打开，然后再把苹果扔进去，关上垃圾桶。

</section><section style="line-height: 1.75em;"><br/></section><section style="line-height: 1.75em;"><br/></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">但每个任务的</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">具体实施细节</span><span style="font-size: 15px;letter-spacing: 1px;">不可能都由人来设计，如何通过一句命令来生成</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">动作序列</span><span style="font-size: 15px;letter-spacing: 1px;">就成了难题。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">用</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">命令生成序列</span><span style="font-size: 15px;letter-spacing: 1px;">？这不正是</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">语言模型</span><span style="font-size: 15px;letter-spacing: 1px;">的工作么？</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">过去有研究人员使用大型语言模型（LLMs）根据输入的任务指令对潜在的下一步行动空间进行评分，然后生成行动序列。</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">指令由自然语言进行描述</span><span style="font-size: 15px;letter-spacing: 1px;">，不包含额外的领域信息。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">但这类方法要么需要列举所有可能的下一步行动进行评分，要么生成的文本在形式上没有任何限制，其中可能包含在当前环境下特定机器人</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">不可能采取的行动</span><span style="font-size: 15px;letter-spacing: 1px;">。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">最近南加州大学和英伟达联合推出了一个新模型</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">ProgPrompt</span><span style="font-size: 15px;letter-spacing: 1px;">，同样使用语言模型对输入指令进行任务规划，其中包含了一个</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">程序化的提示结构</span><span style="font-size: 15px;letter-spacing: 1px;">，使得生成的计划在不同的环境、具有不同能力的机器人、不同的任务中都能发挥作用。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><img class="rich_pages wxw-img" data-ratio="0.22762345679012347" src="https://zfz-oss.oss-cn-hangzhou.aliyuncs.com/temp/fefd397f89eadb00c81b4cf7391ca66e.png" data-type="png" data-w="1296" height="auto" width="1296"/></section><section style="line-height: 1.75em;"><br/></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">在保证任务的规范性上，研究人员采用</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">生成python风格代码</span><span style="font-size: 15px;letter-spacing: 1px;">的形式来提示语言模型哪些动作是可用的、环境中有哪些物体以及哪些程序是可执行的。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">比如输入</span><span style="color: rgb(255, 104, 39);font-size: 15px;letter-spacing: 1px;">「扔苹果<span arial="" gb="" helvetica="" left="" neue="" rgb="" sans="" sans-serif="" sc="" style="font-family: -apple-system-font, system-ui, " ui="" yahei="">」</span>命令</span><span style="font-size: 15px;letter-spacing: 1px;">就可以生成如下程序。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><img class="rich_pages wxw-img" data-ratio="1.3593380614657211" src="https://zfz-oss.oss-cn-hangzhou.aliyuncs.com/temp/c940b0f4c079571c250780529e70d914.png" data-type="png" data-w="423" height="auto" width="423"/></section><section style="line-height: 1.75em;"><br/></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">ProgPrompt模型在</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">虚拟家庭任务中达到了sota性能</span><span style="font-size: 15px;letter-spacing: 1px;">，研究人员还将该模型部署在一个</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">用于桌面任务的物理机器人手臂</span><span style="font-size: 15px;letter-spacing: 1px;">上。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><h1 data-fold-block-container=""><section arial="" autoid="1895" border-box="" break-word="" data-style-type="0" gb="" helvetica="" neue="" rgb="" sans="" sans-serif="" sc="" style="max-width: 100%;white-space: normal;font-family: -apple-system-font, BlinkMacSystemFont, " ui="" yahei=""><section style="padding: 8px;max-width: 100%;box-sizing: border-box;border-left: 6px solid rgb(255, 202, 0);font-size: 18px;line-height: 1.4;font-family: inherit;font-weight: bold;text-decoration: inherit;border-top-color: rgb(255, 202, 0);border-right-color: rgb(255, 202, 0);border-bottom-color: rgb(255, 202, 0);overflow-wrap: break-word !important;"><p>妙用语言模型</p></section></section></h1><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">想要完成日常家庭任务（everyday household tasks）既需要对世界具有常识性理解，也需要当前环境的情景知识。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">为了创建一个「做晚饭」的任务计划，智能体最起码需要知道的常识包括：</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">物体的功能</span><span style="font-size: 15px;letter-spacing: 1px;">，如炉子和微波炉可以用来加热；</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">行动的逻辑顺序</span><span style="font-size: 15px;letter-spacing: 1px;">，在添加食物之前必须先预热烤箱；以及</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">物体和行动的任务相关性</span><span style="font-size: 15px;letter-spacing: 1px;">，例如加热和寻找食材首先是与「晚饭」相关的行动。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">但如果没有</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">状态反馈（state feedback）</span><span style="font-size: 15px;letter-spacing: 1px;">，这种推理就无法进行。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">智能体需要知道</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">当前环境中哪里有食物</span><span style="font-size: 15px;letter-spacing: 1px;">，例如冰箱里是否有鱼，或者冰箱里是否有鸡肉。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">在大型语料库上训练的自回归大型语言模型可以在输入提示的条件下生成文本序列，具有显著的多任务泛化能力。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">比如输入「做晚饭」，语言模型可以生成后续序列，如打开冰箱、拿起鸡肉、拿起苏打水、关闭冰箱、打开电灯开关等。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">生成的文本序列需要映射到智能体的行动空间中，比如生成的指令是「伸手拿起一罐泡菜」，对应的可执行动作可能就是「拿起罐子」，然后模型会计算出一个行动的概率评分值。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">但缺少环境反馈的情况下，如果冰箱里没有鸡肉，却仍然选择「拿起鸡肉」行动，就会导致任务失败，因为「做晚饭」并没有包含任何关于世界状态的信息。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">ProgPrompt模型在任务规划中巧妙地利用了编程语言结构，因为</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">现有的大规模语言模型通常都在编程教程和代码文档的语料中进行过预训练</span><span style="font-size: 15px;letter-spacing: 1px;">。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">ProgPrompt为语言模型提供了一个Pythonic的程序头部作为提示，导入了可用的动作空间、预期参数和环境中可用的物体。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><img class="rich_pages wxw-img" data-ratio="0.7606936416184971" src="https://zfz-oss.oss-cn-hangzhou.aliyuncs.com/temp/15ddfff59566c8cf36f33cf7787d1bdb.png" data-type="png" data-w="865" height="auto" width="865"/></section><section style="line-height: 1.75em;"><br/></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">然后</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">定义</span><span style="font-size: 15px;letter-spacing: 1px;">了诸如make_dinner, throw_away_banana等</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">函数</span><span style="font-size: 15px;letter-spacing: 1px;">，其主体是对物体进行操作的</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">动作序列</span><span style="font-size: 15px;letter-spacing: 1px;">，然后通过</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">断言计划的先决条件</span><span style="font-size: 15px;letter-spacing: 1px;">，例如在试图打开冰箱之前靠近冰箱，以及用恢复行动来应对断言失败的情况，以此</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">纳入环境的状态反馈</span><span style="font-size: 15px;letter-spacing: 1px;">。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">最重要的是，ProgPrompt程序中还包括了</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">自然语言编写的注释</span><span style="font-size: 15px;letter-spacing: 1px;">，用以解释行动的目标，从而提高了生成的计划程序执行任务的成功率。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><h1 data-fold-block-container=""><section arial="" autoid="1895" border-box="" break-word="" data-style-type="0" gb="" helvetica="" neue="" rgb="" sans="" sans-serif="" sc="" style="max-width: 100%;white-space: normal;font-family: -apple-system-font, BlinkMacSystemFont, " ui="" yahei=""><section style="padding: 8px;max-width: 100%;box-sizing: border-box;border-left: 6px solid rgb(255, 202, 0);font-size: 18px;line-height: 1.4;font-family: inherit;font-weight: bold;text-decoration: inherit;border-top-color: rgb(255, 202, 0);border-right-color: rgb(255, 202, 0);border-bottom-color: rgb(255, 202, 0);overflow-wrap: break-word !important;"><p>ProgPrompt</p></section></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section></h1><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">有了完整的想法，ProgPrompt的整体工作流程就清晰了，主要包括</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">三部分</span><span style="font-size: 15px;letter-spacing: 1px;">，</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">Pythonic函数构建</span><span style="font-size: 15px;letter-spacing: 1px;">、</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">构造编程语言提示</span><span style="font-size: 15px;letter-spacing: 1px;">、</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">任务计划的生成和执行</span><span style="font-size: 15px;letter-spacing: 1px;">。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><img class="rich_pages wxw-img" data-ratio="0.5068738792588166" src="https://zfz-oss.oss-cn-hangzhou.aliyuncs.com/temp/0de63e672fbead48589038586a1075cd.png" data-type="png" data-w="1673" height="auto" width="1673"/></section><section style="line-height: 1.75em;"><br/></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">1、将机器人计划表述为Pythonic函数</span></section><section style="line-height: 1.75em;"><br/></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">计划函数包括对</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">动作原语（action primitive）</span><span style="font-size: 15px;letter-spacing: 1px;">的API调用，总结动作并添加注释，以及跟踪执行的断言。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">每个动作原语需要一个物体作为参数，比如「把三文鱼放进微波炉」任务中，包括对find(salmon)的调用，其中find就是一个动作原语。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><img class="rich_pages wxw-img" data-ratio="0.7233273056057866" src="https://zfz-oss.oss-cn-hangzhou.aliyuncs.com/temp/73effe901e21f63847f93d45947bda84.png" data-type="png" data-w="553" height="auto" width="553"/></section><section style="line-height: 1.75em;"><br/></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">利用代码中的注释来为后续的动作序列提供自然语言的总结，注释有助于将高层次的任务分解成合乎逻辑的子任务，即「抓取三文鱼」和「把三文鱼放进微波炉」。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">注释也可以让语言模型了解当前的目标，减少不连贯、不一致或重复输出的可能性，</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">类似于思维链（chain of thought）</span><span style="font-size: 15px;letter-spacing: 1px;">生成中间结果。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">断言（assertions）</span><span style="font-size: 15px;letter-spacing: 1px;">提供了一个环境反馈机制，以确保前提条件成立，并在不成立时实现错误恢复，比如在抓取行动之前，计划断言智能体已经接近了三文鱼，否则智能体需要先执行find行动。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">2、构造编程语言prompt</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);"><br/></span></section><section style="line-height: 1.75em;"><br/></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">prompt需要向语言模型提供关于环境和主要行动的信息，包括观察、行动原语、例子，并生成了一个Pythonic提示，供语言模型补全。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;text-align: center;"><img class="rich_pages wxw-img" data-ratio="1.8756613756613756" src="https://zfz-oss.oss-cn-hangzhou.aliyuncs.com/temp/0128ae9fff12aba29961f0a69c5072e9.png" data-type="png" data-w="378" height="auto" width="378"/></section><section style="line-height: 1.75em;text-align: center;"><br/></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">然后，语言模型将<next task="">预测为一个可执行的函数，即microwave_salmon()</next></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"/></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">在微波炉三文鱼这个任务中，LLM可以生成的且合理的第一步是取出三文鱼，但负责执行计划的智能体可能没有这样一个动作原语。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">为了让语言模型了解智能体的动作原语，将其在prompt中通过import语句导入，也就将输出限制为在当前环境下可用的函数上。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">要改变智能体的行为空间，只需要</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">更新import的函数列表</span><span style="font-size: 15px;letter-spacing: 1px;">即可。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">变量objects</span><span style="font-size: 15px;letter-spacing: 1px;">以一个字符串列表的形式提供了环境中的所有</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">可用物体</span><span style="font-size: 15px;letter-spacing: 1px;">。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">prompt还包括一些完全可执行的程序计划作为示例，每个示例任务都演示了如何使用给定环境中的可用动作和目标来完成一个给定的任务，如throw_away_lime</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">3、任务计划的生成和执行</span></section><section style="line-height: 1.75em;"><br/></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">给定任务之后，计划完全是由语言模型根据ProgPrompt提示推断出来的，然后可以将生成的计划在虚拟智能体或物理机器人系统上执行，需要用到一个解释器，针对环境执行每个行动命令。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">在执行过程中，断言检查以闭环的方式进行，并根据当前环境状态提供反馈。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">在实验部分，研究人员在</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">虚拟家庭(VH)仿真平台</span><span style="font-size: 15px;letter-spacing: 1px;">评估了该方法。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">VH的状态包括一组物体和相应的属性，比如三文鱼在微波炉内部（in），或者靠近（agent_close_to）等。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">行动空间包括抓取（grab）、放入（putin）、放回（putback）、行走（walk），寻找（find）、打开（open）、关闭（close）等。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">最终实验了3个VH环境，每个环境包括115种不同的物体，研究人员创建了一个包含70个家务任务的数据集，抽象程度很高，命令都是「微波三文鱼」这类的，并为之创建一个ground-truth的行动序列。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">在虚拟家庭上对生成的程序进行评估后，评估指标包括成功率（SR），目标条件召回（GCR）和可执行性（Exec），从结果上可以看到ProgPrompt明显优于基线和LangPrompt，表格中还展示了每个特征是如何提升性能的。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><img class="rich_pages wxw-img" data-ratio="0.268801191362621" src="https://zfz-oss.oss-cn-hangzhou.aliyuncs.com/temp/0cd47bdbc3320e365e612c05063e5271.png" data-type="png" data-w="1343" height="auto" width="1343"/></section><section style="line-height: 1.75em;"><br/></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">研究人员同样在</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">真实世界</span><span style="font-size: 15px;letter-spacing: 1px;">进行了实验，使用一个带有平行爪子的Franka-Emika熊猫机器人，并假设可以获得一个拾取和放置（pick-and-place）的策略。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">该策略将目标物体和目标容器的两个点云作为输入，并执行拾取和放置操作，将物体放在容器上或里面。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">系统实现引入一个开放词汇的物体检测模型ViLD来识别和分割场景中的物体，并构建prompt中的可用物体列表。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">与在虚拟环境中不同的是，这里物体列表是每个计划函数的局部变量，这样可以更灵活地适应新对象。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">语言模型输出的计划中包含形式为grab和putin等函数调用。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">由于现实世界的不确定性，实验设置中</span><span style="font-size: 15px;letter-spacing: 1px;color: rgb(255, 104, 39);">没有实施基于断言的闭环选项</span><span style="font-size: 15px;letter-spacing: 1px;">。</span></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;"><br/></span></section><section style="line-height: 1.75em;"><img class="rich_pages wxw-img" data-ratio="0.17049808429118773" src="https://zfz-oss.oss-cn-hangzhou.aliyuncs.com/temp/0888f4a4cfdcd32665e73f9ded51362b.png" data-type="png" data-w="1044" height="auto" width="1044"/></section><section style="line-height: 1.75em;"><br/></section><section style="line-height: 1.75em;"><span style="font-size: 15px;letter-spacing: 1px;">可以看到，机器人在分类任务中，能够识别出香蕉和草莓是水果，并生成计划步骤，将它们放在盘子里，而将瓶子放在盒子里。</span></section><p line="kesj" ql-global-para="true"><br/></p></span></span></section><section arial="" border-box="" break-word="" gb="" helvetica="" left="" neue="" rgb="" sans="" sans-serif="" sc="" style="max-width: 100%;min-height: 1em;white-space: normal;font-family: -apple-system-font, system-ui, " ui="" yahei=""><span style="font-size: 14px;color: rgb(136, 136, 136);letter-spacing: 1px;">参考资料：</span></section><section arial="" border-box="" break-word="" gb="" helvetica="" left="" neue="" rgb="" sans="" sans-serif="" sc="" style="max-width: 100%;min-height: 1em;white-space: normal;font-family: -apple-system-font, system-ui, " ui="" yahei=""><span style="font-size: 14px;color: rgb(136, 136, 136);letter-spacing: 1px;">https://progprompt.github.io/</span></section><section style="text-align: center;margin-bottom: 0em;margin-left: 8px;margin-right: 8px;"><br/></section><section style="text-align: center;margin-left: 8px;margin-right: 8px;margin-bottom: 0px;"><img class="rich_pages wxw-img" data-backh="432" data-backw="578" data-galleryid="" data-ratio="0.7472222222222222" data-s="300,640" src="https://zfz-oss.oss-cn-hangzhou.aliyuncs.com/temp/ec7052ac1c88f5835e3dedc09b1b51e5.jpeg" data-type="jpeg" data-w="1080" style="width: 100%;height: auto;"/></section><section style="text-align: center;margin-left: 8px;margin-right: 8px;margin-bottom: 0px;"><br/></section><section style="text-align: center;line-height: 1.75em;margin-left: 8px;margin-right: 8px;margin-bottom: 0px;"><img class="rich_pages wxw-img" data-ratio="0.1540880503144654" src="https://zfz-oss.oss-cn-hangzhou.aliyuncs.com/temp/7421176610e0669e4d360a2620c36372.gif" data-type="gif" data-w="636" style="width: 367px;height: 57px;"/></section><p style="margin-bottom: 24px;"><br/></p><p style="display: none;"><mp-style-type data-value="3"/></p></div>