PGµç¾º


  • ±¾Õ¾µãʹÓÃCookies£¬¼ÌÐøä¯ÀÀ±íʾÄúͬÒâÎÒÃÇʹÓÃCookies¡£ CookiesºÍÒþ˽Õþ²ß>

    ¼òÌåÖÐÎÄ
    English
    Ê×Ò³ > ¹ØÓÚÎÒÃÇ > ÐÂÎÅÖÐÐÄ > ¶á¹Ú·½°¸Ïê½â£¡CVPRÈ«Çò×Ô¶¯¼ÝÊ»ÌôÕ½ÈüÖ®PGµç¾ºÐÅÏ¢F-OCCËã·¨

    ¶á¹Ú·½°¸Ïê½â£¡CVPRÈ«Çò×Ô¶¯¼ÝÊ»ÌôÕ½ÈüÖ®PGµç¾ºÐÅÏ¢F-OCCËã·¨

    2024-07-23

    ½üÈÕ£¬ÔÚÈ«ÇòȨÍþµÄCVPR 2024×Ô¶¯¼ÝÊ»¹ú¼ÊÌôÕ½Èü£¨Autonomous Grand Challenge£©ÖУ¬PGµç¾ºÐÅÏ¢AIÍŶÓËùÌá½»µÄ¡°F-OCC¡±Ë㷨ģÐÍÒÔ48.9%µÄ³öÉ«³É¼¨Õ¶»ñÕ¼¾ÝÕ¤¸ñºÍÔ˶¯¹À¼Æ(Occupancy & Flow)ÈüµÀµÚÒ»Ãû¡£

    ±¾ÆªÎÄÕ½«¸ù¾ÝPGµç¾ºÐÅÏ¢Ìá½»µÄ¼¼Êõ±¨¸æ¡°3D Occupancy and Flow Prediction based on Forward View Transformation¡±£¬Ïê½âÆäʹÓõÄÄ£Ðͼܹ¹¡¢ÓÅ»¯´ëÊ©ºÍʵÑé½á¹û¡£

    PGµç¾ºÐÅÏ¢AIÍŶÓÕ¶»ñÕ¼¾ÝÕ¤¸ñºÍÔ˶¯¹À¼ÆÈüµÀµÚÒ»Ãû


    ͼ1 PGµç¾ºÐÅÏ¢AIÍŶÓÕ¶»ñÕ¼¾ÝÕ¤¸ñºÍÔ˶¯¹À¼ÆÈüµÀµÚÒ»Ãû

    ±³¾°ÓëÌôÕ½

    3D³¡¾°¸ÐÖªÔÚ×Ô¶¯¼ÝʻϵͳÖаçÑÝ×ŷdz£ÖØÒªµÄ½ÇÉ«¡£µ±Ç°³ÇÊеÀ·½»Í¨»·¾³ÖУ¬µÀ·²¼¾Ö¸´ÔÓ¡¢½»Í¨²ÎÓëÔªËØ¶àÑù£¬¶Ô×Ô¶¯¼ÝÊ»¸ÐÖªÈÎÎñÌá³öÁ˼«´óµÄÌôÕ½¡£´«Í³µÄÈýάÎïÌå¼ì²â·½·¨Ê¹ÓÃ3D¿òÀ´ÃèÊöÎïÌåµÄλÖᢴóСºÍ·½Ïò£¬È±·¦¶ÔÎïÌ帴ÔÓ¼¸ºÎÐÎ×´µÄÏêϸÃèÊö¡£Í¬Ê±£¬´ËÀà·½·¨´ó¶àÖ»¹Ø×¢Ä¿±êÎïÌ壬Èç³µÁ¾¡¢ÐÐÈË¡¢×ÔÐгµµÈ£¬È±·¦¶ÔÂ·Ãæ¡¢ÈËÐкáµÀ¡¢½¨ÖþÎïµÈ¾²Ì¬½»Í¨ÔªËصļì²â¡£Õ¼¾ÝÕ¤¸ñÊÇÒ»ÖÖеÄ×Ô¶¯¼ÝÊ»³¡¾°±íʾ£¬Æä½«³µÁ¾ÖÜΧ3D¿Õ¼ä½øÐÐÌåËØ»¯£¬²¢¶Ôÿ¸öÁ¢ÌåÍø¸ñÌí¼ÓÕ¼¾Ý¡¢ÓïÒåºÍÔ˶¯ÐÅÏ¢¡£Õ¼¾ÝÕ¤¸ñÔ¤²âÐèÒª¶Ô3D¿Õ¼äÖеÄÿ¸öÌåËØµÄÕ¼¾ÝÐÅÏ¢ºÍÓïÒå±êÇ©½øÐÐÔ¤²â£¬Îª×Ô¶¯¼ÝʻϵͳÌṩ¸ü¾«Ï¸¡¢È«ÃæµÄ³¡¾°¸ÐÖªÐÅÏ¢£¬ÒÔÌáÉý×Ô¶¯¼ÝʻϵͳÔÚ¸´ÔÓ³¡¾°Ïµİ²È«ÐԺͿɿ¿ÐÔ¡£

    Occupancy and FlowʾÒâͼ

    ͼ2 Occupancy and FlowʾÒâͼ

    »ùÓÚÏà»úÊý¾ÝµÄ3D³¡¾°¸ÐÖª¿ò¼Ü¿ÉÒÔ´óÖ·ÖΪÈýÀࣺµÚÒ»ÀàÊÇÒÔLSSºÍBEVDetΪ´ú±íµÄǰÏòͶӰ·½·¨£¬ÕâÀà·½·¨ÀûÓÃÏà»úÄڲΡ¢Íâ²ÎÊý¾Ý£¬½«Í¼ÏñÌØÕ÷ͨ¹ý¹À¼ÆµÄͼÏñÉî¶ÈÐÅÏ¢£¬Í¶Ó°µ½ÒÔ³µÁ¾ÎªÖÐÐĵÄ3D¿Õ¼ä£¬²¢½øÐÐÌåËØ»¯ÒԵõ½3DÌØÕ÷£»µÚ¶þÀàÊÇÒÔBEVFormerΪ´ú±íµÄÄæÏòͶӰ·½·¨£¬ÆäÊ×ÏÈÔÚ3D¿Õ¼äÖй¹½¨²éѯµã£¬È»ºóͨ¹ýÏà»úÄڲκÍÍâ²Î½«²éѯµãͶӰµ½2DͼÏñÌØÕ÷¿Õ¼ä£¬ÒÔ»ñÈ¡¶ÔÓ¦µÄͼÏñÌØÕ÷ÐÅÏ¢£»µÚÈýÀàÊÇÒÔFB-OCCΪ´ú±íµÄË«ÏòͶӰ·½·¨£¬ÕâÀà·½·¨ÈÚºÏÉÏÊöÁ½ÖÖ·½·¨À´¹¹½¨3DÌØÕ÷¡£

    ·½·¨½éÉÜ

    ÕûÌå¼Ü¹¹

    F-OCCÄ£ÐͲÉÓÃÁËǰÏòͶӰ¿ò¼ÜÒÔ¼æ¹Ë׼ȷ¶ÈÓëÔËÐÐЧÂÊ¡£Ê×ÏÈ£¬¶àÉãÏñÍ·Êý¾Ýͨ¹ýͼÏñ±àÂëÍøÂ磬µÃµ½2DͼÏñÌØÕ÷¡£È»ºó£¬Éî¶ÈÔ¤²âÍøÂçÀûÓÃ2DͼÏñÌØÕ÷¹À¼ÆÃ¿¸öÌØÕ÷µãµÄÉî¶ÈÐÅÏ¢¡£ÀûÓùÀ¼ÆµÄÉî¶ÈÐÅÏ¢£¬Ä£Ðͽ«Í¼Ïñ¿Õ¼äÖеÄ2DÌØÕ÷ͶӰµ½ÒÔ³µÁ¾×ÔÉíΪÖÐÐĵÄ3D¿Õ¼ä£¬²¢½øÐÐÌåËØ»¯¡£3D±àÂëÍøÂç¶ÔµÃµ½µÄ3DÌØÕ÷½øÐÐÌØÕ÷ÔöÇ¿£¬ÒÔÌáÉýÆä±íÕ÷ÄÜÁ¦¡£×îºó£¬¼ì²âÍøÂçÊä³ö3D¿Õ¼äÖÐÿ¸öµãµÄÕ¼¾ÝÐÅÏ¢¡¢ÓïÒå±êÇ©ºÍÔ˶¯ÐÅÏ¢Ô¤²â¡£Í¼3ΪF-OCCµÄÄ£Ðͼܹ¹Í¼¡£

    Ä£Ðͼܹ¹Í¼

    ͼ3 Ä£Ðͼܹ¹Í¼

    £¨ÓÒÉÏ£º²»Í¬ÑÕÉ«±íʾ²»Í¬Àà±ðµÄÌåËØ£¬ÓÒÏ£ºÑÕÉ«´ú±íËÙ¶È·½Ïò£¬ÁÁ¶È´ú±íËÙ¶È´óС£¬±³¾°ÌåËØÎª»ÒÉ«£©

    ÓÅ»¯´ëÊ©

    1>Êý¾ÝÔ¤´¦Àí

    ÑÚÄ£Éú³É¹ý³ÌʾÒâͼ

    ͼ4 ÑÚÄ£Éú³É¹ý³ÌʾÒâͼ

    £¨×ó£ºÔ­Ê¼ÕæÖµÊ¾Òâͼ£»ÖУºÄ£Ä⼤¹âÉäÏßʾÒâͼ£»ÓÒ£ºÑÚÄ£ºóÕæÖµÊ¾Òâͼ£©

    ±¾´ÎÌôÕ½ÌṩµÄѵÁ·Êý¾ÝÖУ¬ºÜ¶àÏà»úÎÞ·¨Ö±½Ó¹Û²âµ½µÄÌåËØµãÒ²±»±ê¼ÇÁËÓïÒåÐÅÏ¢£¬ÀýÈ磬±»ÆäËüÎïÌåÕÚµ²µÄÌåËØ¡¢ÎïÌåÄÚ²¿²»¿É¼ûµÄÌåËØ¡£ÕâЩÌåËØÊý¾ÝÔÚѵÁ·¹ý³ÌÖУ¬»á¶Ô»ùÓÚÏà»úÊý¾ÝµÄÔ¤²âÍøÂçµÄÓÅ»¯²úÉú¸ÉÈÅ¡£²Î¿¼SparseOcc£¬±¾·½·¨¶ÔÕæÖµÊý¾Ý½øÐÐÑÚÄ£´¦Àí¡£Èçͼ4Öмä×Óͼ£¬ÔÚѵÁ·¹ý³ÌÖУ¬¸ù¾Ý³µÁ¾ÐÐÊ»¹ì¼£Ä£Äâ¶à¸öLiDAR·¢Éäµã£¬²¢ÔÚÿ¸ö·¢ÉäµãÄ£ÄâÉú³É¶àÊø¼¤¹âÉäÏߣ¬Ã¿Êø¼¤¹âÉäÏßÖÕÖ¹ÓÚ´¥Åöµ½µÄµÚÒ»¸öÓÐÓïÒåÐÅÏ¢µÄÌåËØ¡£¼¤¹âÉäÏß´¥ÅöµÄÕ¼¾ÝÌåËØºÍÓë·¢ÉäµãÖ®¼äµÄ·ÇÕ¼¾ÝÌåËØ±ê¼ÇΪTrue£¬ÆäÓàµÄÌåËØµã±ê¼ÇΪFalse£¬ÒÔ´ËÉú³ÉÕæÖµÌåËØµÄÑÚÄ£±êÇ©¡£ÑµÁ·¹ý³ÌÖУ¬Ö»ÓÃÑÚÄ£±ê¼ÇΪTrueµÄÌåËØ½øÐÐÄ£ÐÍѵÁ·£¬ºöÂÔµô±ê¼ÇΪFalseµÄÌåËØµã¡£Í¼4ÓÒͼչʾÁËÑÚÄ£ºóµÄÓÐЧÌåËØ¡£¿ÉÒÔ¿´µ½£¬ÔÚÄ£ÐÍѵÁ·ÖУ¬ÕÚµ²µÄµã»òÕßÎïÌåÄÚ²¿µÄµã£¬Ã»ÓвÎÓëÄ£Ð͵ÄѵÁ·¡£

    ÑÚģʾÒâͼÓë¸Ä½ø

    ͼ5 ÑÚģʾÒâͼÓë¸Ä½ø

    Ä£ÐÍÔ¤²â¹ý³ÌÖУ¬3D¸ÐÖªÇøÓò±ßÔµ»á³öÏֺܶà´í¼ìµã¡£Ô­ÒòÖ®Ò»ÊÇÔÚÑÚÄ£Éú³É¹ý³ÌÖУ¬ºöÂÔÁ˲¿·Ö¸ÐÖªÇøÓò±ßÔµµÄÌåËØÐÅÏ¢¡£ÍÆÀí¹ý³ÌÖУ¬ÓÉÓÚÉî¶È¹À¼ÆµÄÎó²î£¬²¿·Ö¼ì²â·¶Î§ÍâµÄÎïÌåÌØÕ÷Ó³Éäµ½Á˼ì²âÇøÓòÄÚ£¬µ¼Ö´í¼ì¡£»ùÓÚÕâÖÖ¿¼ÂÇ£¬ÎÒÃǶÔÑÚÄ£Éú³É·½°¸½øÐÐÁËÓÅ»¯£¬ÔÚ¼ì²â·¶Î§µÄ±ßÔµ¸½½üËæ»úÌí¼ÓÁË20%µÄÌåËØµã£¬²ÎÓëÄ£ÐÍѵÁ·¡£ÓÅ»¯Ç°ºóµÄÓÐЧÌåËØ¿ÉÊÓ»¯ÈçÉÏͼÓÒ²àËùʾ¡£ÓÅ»¯ºó£¬Ä£Ð͵ÄOcc_score´Ó0.32ÌáÉýµ½0.34¡£

    02>ͼÏñ»ù´¡ÍøÂç

    ͼÏñ±àÂëÍøÂçµÄÐÔÄܶÔÕû¸öÄ£Ð͵ÄÔ¤²â¾«¶È·Ç³£ÖØÒª¡£¿¼Âǵ½Ä£Ð͵ÄÔËËãЧÂʺÍÔ¤²â¾«¶ÈѰÇó£¬ÎÒÃÇÑ¡ÔñFlashInternImageϵÁеÄͼÏñ±àÂëÄ£ÐÍ×÷ΪģÐ͵ÄͼÏñ±àÂëÍøÂç¡£ÕâÖÖÍøÂçÓÅ»¯ÁËInternImageÍøÂçÖеÄDCNËã×Ó£¬ÌáÉýÁËÄ£Ð͵ļì²âËٶȺͼì²â¾«¶È¡£ÔÚ²âÊÔʵÑéÖУ¬ÎÒÃÇʹÓÃÁËFlashInternImage-TinyºÍFlashInternImage-Large½øÐвâÊÔ¡£ÔÚ×îÖÕ°æ±¾ÖУ¬ÎÒÃÇʹÓÃÁËFlashInternImage-Large£¬Æä°üº¬ÁË´óÔ¼220MµÄÄ£ÐͲÎÊý¡£

    03>¿ÉÐαä3D¾í»ý

    ¿ÉÐαä3D¾í»ýʾÒâͼ

    ͼ6 ¿ÉÐαä3D¾í»ýʾÒâͼ

    Ïà±ÈÓÚ´«Í³µÄ¾í»ý²Ù×÷£¬¿ÉÐαä¾í»ý¾ßÓнϴóµÄ¸ÐÖª·¶Î§ºÍ½ÏÇ¿µÄ±àÂëÄÜÁ¦£¬ÆäÔÚͼÏñ¼ì²âÈÎÎñÉÏչʾÁ˽ÏÇ¿µÄÐÔÄÜ¡£±¾Ä£Ðͽ«¿ÉÐαä¾í»ýËã×ÓDCNv4ÔÚ3DÌØÕ÷ÉϽøÐÐÁËÍØÕ¹¡£ÔÚ3DÌåËØÌØÕ÷±àÂëÄ£¿éÖУ¬´«Í³µÄ3D¾í»ýËã×ÓÌæ»»Îª¿ÉÐαä3D¾í»ýËã×Ó£¬ÌáÉýÁËÄ£Ð͵ÄÕûÌå¼ì²âÄÜÁ¦¡£ÎªÌáÉýÄ£Ð͵ÄÔËËãËÙ¶È¡¢½µµÍÄ£Ð͵ÄÏÔ´æÏûºÄ£¬ÎÒÃÇʹÓÃCUDA¶ÔDCN3D½øÐÐÁËʵÏÖÓëÓÅ»¯¡£Ïà½ÏÓÚPytorch°æ±¾£¬CUDAʵÏÖ°æ±¾ÌáÉýÁËÄ£Ð͵ÄÔËËãËÙ¶È£¬Í¬Ê±½µµÍÁËÏÔ´æÏûºÄ¡£

    ʵÑé½á¹û

    ±í1 ²»Í¬ÉèÖÃϵÄÕ¼¾ÝÕ¤¸ñºÍÔ˶¯Ô¤²â±íÏÖ

    ²»Í¬ÉèÖÃϵÄÕ¼¾ÝÕ¤¸ñºÍÔ˶¯Ô¤²â±íÏÖ

    ΪÑéÖ¤ÓÅ»¯´ëÊ©µÄÓÐЧÐÔ£¬±¾ÎÄÔÚValidationÊý¾Ý×Ó¼¯ÉϽøÐÐÁËÏûÈÚʵÑ飬½á¹ûÈçÉÏͼËùʾ¡£

    BaselineΪ¹Ù·½ÌṩµÄ»ùÓÚBEVFormerµÄÔ¤²âÄ£ÐÍ¡£Version AÖУ¬ÎÒÃÇÔÚ»ù´¡Ä£Ð͵ÄѵÁ·ÖÐÌí¼ÓÁË¿ÉÊÓ»¯ÑÚÄ£µÄÊý¾ÝÔ¤´¦Àí¡£Îª½µµÍÀà±ð²»¾ùºâ£¬ÕâÁ½ÖÖ·½·¨ÖУ¬ÎÒÃÇÔÚ·ÇÕ¼¾ÝµÄÌåËØÖÐËæ»úÌôÑ¡ÁË20%²ÎÓëѵÁ·¡£ÔÚVersion Bµ½Version DµÄʵÑéÖУ¬ÎÒÃÇÒÔFB-OCCΪ¿ò¼Ü£¬²âÊÔÁËÑÚÄ£ºÍDCN3DµÄÓÐЧÐÔ¡£ÆäÖУ¬Version BʹÓóõʼµÄÑÚÄ£Êý¾Ý½øÐÐѵÁ·£¬Version CʹÓøÄÁ¼µÄÑÚÄ£Êý¾Ý½øÐÐѵÁ·¡£Version DÖУ¬ÎÒÃǽ«3DÌåËØ±àÂëÖеĴ«Í³3D¾í»ýËã×ÓÌæ»»ÎªDCN3DËã×Ó¡£Í¨¹ý½á¹û¿É¼û£¬»ùÓÚÑÚÄ£µÄÊý¾ÝÔ¤´¦ÀíºÍDCN3D¶¼¿ÉÒÔÌáÉýÄ£Ð͵ļì²â¾«¶È¡£Version EÖУ¬ÎÒÃDzÉÓÃÁËǰÏòͶӰ¼Ü¹¹£¬²¢½«Image backboneÌæ»»ÎªFlashInternImage-Tiny¡£ÔÚversion F¡¢G¡¢HÖУ¬ÎÒÃÇ·Ö±ð²âÊÔÁ˹ǼÜÍøÂ硢ͼÏñ³ß´çºÍÌåËØ·Ö±æÂʶÔÔ¤²â½á¹ûµÄÓ°Ï졣ͨ¹ý±í¸ñ¿É¼û£¬ÌáÉýͼÏñ¹À¼Û¡¢Í¼Ïñ³ß´çºÍÌåËØ·Ö±æÂÊ¿ÉÒÔÌáÉýÄ£Ð͵ļì²âÐÔÄÜ¡£

    ×îÖÕÌá½»½á¹ûµÄÄ£Ð͹ǼÜΪF-Occ£¬»ù´¡ÍøÂçΪFlashInternImage-L£¬Í¼Ïñ³ß´çΪ1600x640£¬ÌåËØ·Ö±æÂÊΪ0.4m¡£×îÖÕ×ÛºÏOccµÃ·ÖΪ0.489¡£ÔÚ¼ì²âÍ·ÖУ¬FlowÔ¤²â·ÖÖ§ÓëOccupancyÔ¤²â·ÖÖ§µÄÍøÂç½á¹¹ÏàËÆ¡£ºó´¦Àí¹ý³ÌÖУ¬ÔÚÊä³öFlowÔ¤²â½á¹ûǰ£¬ÎÒÃǶÔÔ¤²âµÄFlow¹À¼Æ½øÐÐÁË´¦Àí¡£ÎÒÃǽ«ËùÓÐÕ¼¾ÝÍøÂçÔ¤²â·ÖÖ§ÖйÀ¼ÆÎª±³¾°£¨·Çǰ8Àࣩ»òÕßFreeµÄÌåËØ¶ÔÓ¦µÄFlowÖµÉèÖÃΪ0¡£Ä£ÐÍûÓнøÐÐTTA£¨test-time augmentation£©ºÍÄ£Ðͼ¯³ÉµÄ²Ù×÷¡£

    ×ܽá

    ±¾ÎĽéÉÜÁË»ñµÃÕ¼¾ÝÕ¤¸ñºÍÔ˶¯¹À¼ÆÈüµÀµÚÒ»ÃûµÄ¡°F-OC¡±Ë㷨ģÐÍ¡£Ä£ÐÍͨ¹ýÊý¾ÝÔ¤´¦Àí¡¢Í¼Ïñ»ù´¡ÍøÂçɸѡ¡¢Ëã×ÓÓÅ»¯µÈ´ëÊ©£¬ÌáÉýÁ˶ÔÕ¼¾ÝÕ¤¸ñºÍÔ˶¯¹À¼ÆµÄ¼ì²âÄÜÁ¦¡£

    ²Î¿¼ÎÄÏ×

    [1]  Jonah Philion and Sanja Fidler. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In Proceedings of the European Conference on Computer Vision, 2020

    [2]  Junjie Huang and Guan Huang. Bevdet4d: Exploit temporal cues in multi-camera 3d object detection. ArXiv 2022

    [3]  Yuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu, Jiapeng Luo, Wenhai Wang, Tong Lu, Hongsheng Li, Yu Qiao, et al. Efficient deformable convnets: Rethinking dynamic and sparse operator for vision applications. ArXiv, 2024

    [4]  Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Yu Qiao, and Jifeng Dai. Bevformer: Learning bird¡¯s-eye-view representation from multi-cameraimages via spatiotemporal transformers. ECCV, 2022

    [5]  Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, and Jose M Alvarez. Fb-occ: 3d occupancy prediction based on forward-backward view transformation. ArXiv, 2023

    [6]  Haisong Liu, Yang Chen, Haiguang Wang, Zetong Yang, Tianyu Li, Jia Zeng, Li Chen, Hongyang Li, and Limin Wang. Fully Sparse 3D Occupancy Prediction. ArXiv 2023

    [7]  Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, et al. Internimage: Exploring large-scale vision foundation models with deformable convolutions. CVPR, 2023

    ÊÛǰ×Éѯ

    ÊÛºó·þÎñ

    Òâ¼û·´À¡

    AIStore

    »Øµ½¶¥²¿

    »Øµ½¶¥²¿

    ÊÕÆð
    »Øµ½¶¥²¿ »Øµ½¶¥²¿
    ÇëÑ¡Ôñ·þÎñÏîÄ¿
    ÊÛǰ×Éѯ
    ÊÛºó·þÎñ
    ·ÃÎÊ AIStore

    ɨÂë·ÃÎÊAIStore

    ¡¾ÍøÕ¾µØÍ¼¡¿¡¾sitemap¡¿