Tech Product

Gemini

別名: Gemini AI, Gemini, Gemini 3, Gemini Nano, Google Gemini

Overview

"[{\"type\": \"paragraph\", \"children\": [{\"text\": \"Googleが開発した最先端の大規模言語モデル(LLM)。テキスト、画像、音声などのマルチモーダルな入力を理解し、高度な推論や生成を行うことができる。Pixel 10においては、ユーザーの曖昧な指示を理解して写真編集を実行する「Conversational Photo Editing」の基盤技術として採用されており、スマートフォンのAI体験を次の段階へ引き上げる役割を担っている。\", \"type\": \"text\"}]}]"

Research Papers

5 件
  • Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Machel Reid, N. Savinov, Denis Teplyashin, Dmitry Lepikhin, T. Lillicrap, Jean-Baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, Ioannis Antonoglou, Rohan Anil, Sebastian Borgeaud, Andrew M. Dai, Katie Millican, Ethan Dyer, Mia Glaese, Thibault Sottiaux, Benjamin Lee, Fabio Viola, Malcolm Reynolds, Yuanzhong Xu, James Molloy, Jilin Chen, M. Isard, P. Barham, T. Hennigan, Ross Mcilroy, Melvin Johnson, J. Schalkwyk, Eli Collins, Eliza Rutherford, Erica Moreira, Kareem W. Ayoub, Megha Goel, Clemens Meyer, Gregory Thornton, Zhen Yang, H. Michalewski, Zaheer Abbas, Nathan Schucher, Ankesh Anand, Richard Ives, James Keeling, Karel Lenc, S. Haykal, Siamak Shakeri, Pranav Shyam, A. Chowdhery, Roman Ring, Stephen Spencer, Eren Sezener, Luke Vilnis, Os-car Chang, Nobuyuki Morioka, George Tucker, Ce Zheng, Oliver Woodman, Nithya Attaluri, Tomás Kociský, Evgenii Eltyshev, Xi Chen, Timothy Chung, Vittorio Selo, Siddhartha Brahma, Petko Georgiev, Ambrose Slone, Zhenkai Zhu, James Lottes, Siyuan Qiao, Ben Caine, Sebastian Riedel, Alex Tomala, Martin Chadwick, J Christopher Love, Peter Choy, Sid Mittal, N. Houlsby, Yunhao Tang, Matthew Lamm, Libin Bai, Qiao Zhang, Luheng He, Yong Cheng, Peter C. Humphreys, Yujia Li, Sergey Brin, Albin Cassirer, Ying-Qi Miao, Lukás Zilka, Taylor Tobin, Kelvin Xu, Lev Proleev, Daniel Sohn, Al-berto Magni, L. Hendricks, Isabel Gao, Santiago Ontan'on, Oskar Bunyan, Nathan Byrd, Abhanshu Sharma, Biao Zhang, Mario Pinto, Rishika Sinha, Harsh Mehta, Dawei Jia, Sergi Caelles, Albert Webson, A. Morris, Becca Roelofs, Yifan Ding, Robin Strudel, Xuehan Xiong, Marvin Ritter, Mostafa Dehghani, R. Chaabouni, Abhijit Karmarkar, Guangda Lai, Fabian Mentzer, Bibo Xu, Yaguang Li, Yujing Zhang, T. Paine, Alex Goldin, Behnam Neyshabur, Kate Baumli, Anselm Levskaya, Michael Laskin, Wenhao Jia, Jack W. Rae, Kefan Xiao, Antoine He, Skye Giordano, Lakshman Yagati, Jean-Baptiste Lespiau, Paul Natsev, S. Ganapathy, Fangyu Liu, Danilo Martins, Nanxin Chen, Yunhan Xu, Megan Barnes, Rhys May, Arpi Vezer, Junhyuk Oh, Ken Franko, Sophie Bridgers, Ruizhe Zhao, Boxi Wu, Basil Mustafa, Sean Sechrist, Emilio Parisotto, Thanumalayan Sankaranarayana Pillai, Chris Larkin, Chenjie Gu, C. Sorokin, M. Krikun, Alexey Guseynov, Jessica Landon, Romina Datta, A. Pritzel, Phoebe Thacker, Fan Yang, Kevin Hui, A. Hauth, C. Yeh, David Barker, J. Mao-Jones, Sophia Austin, Hannah Sheahan, P. Schuh, James Svensson, Rohan Jain, V. Ramasesh, Anton Briukhov, D. Chung, Tamara von Glehn, Christina Butterfield, Priya Jhakra, Matt Wiethoff, Justin Frye, Jordan Grimstad, Beer Changpinyo, Charline Le Lan, Anna Bortsova, Yonghui Wu, P. Voigtlaender, Tara N. Sainath, Charlotte Smith, Will Hawkins, Kris Cao, James Besley, S. Srinivasan, Mark Omernick, Colin Gaffney, G. Surita, Ryan Burnell, Bogdan Damoc, Junwhan Ahn, Andrew Brock, Mantas Pajarskas, A. Petrushkina, Seb Noury, Lorenzo Blanco, Kevin Swersky, Arun Ahuja, Thi Avrahami, Vedant Misra, Raoul de Liedekerke, Mariko Iinuma, A. Polozov, Sarah York, George van den Driessche, Paul Michel, Justin Chiu, R. Blevins, Zach Gleicher, Adrià Recasens, Alban Rrustemi, E. Gribovskaya, Au-rko Roy, Wiktor Gworek, Sébastien M. R. Arnold, Lisa Lee, James Lee-Thorp, M. Maggioni, Enrique Piqueras, Kartikeya Badola, S. Vikram, Lucas Gonzalez, Anirudh Baddepudi, Evan Senter, J. Devlin, James Qin, Michael Azzam, Maja Trebacz, M. Polacek, Kashyap Krishnakumar, Shuo-Yiin Chang, Matthew Tung, Ivo Penchev, Rishabh Joshi, Kate Olszewska, Carrie Muir, Mateo Wirth, A. Hartman, Joshua Newlan, S. Kashem, Vijay Bolina, Elahe Dabir, Joost R. van Amersfoort, Zafarali Ahmed, James Cobon-Kerr, Aishwarya B Kamath, A. M. Hrafnkelsson, Le Hou, Ian Mackinnon, Alexandre Frechette, Eric Noland, Xi-ance Si, Emanuel Taropa, Dong Li, Phil Crone, Anmol Gulati, S'ebastien Cevey, Jonas Adler, Ada Ma, David Silver, Simon Tokumine, Richard Powell, Stephan Lee, Michael B. Chang, Samer Hassan, Diana Mincu, Antoine Yang, Nir Levine, Jenny Brennan, Mingqiu Wang, Sarah Hodkinson, Jeffrey Zhao, Josh Lipschultz, Aedan Pope, Michael B. Chang, Cheng Li, Laurent El Shafey, M. Paganini, Sholto Douglas, Bernd Bohnet, Fabio Pardo, S. Odoom, Mihaela Roșca, Cicero Nogueira dos Santos, K. Soparkar, A. Guez, Tom Hudson, Steven Hansen, Chulayuth Asawaroengchai, Ravichandra Addanki, Tianhe Yu, Wojciech Stokowiec, Mina Khan, Justin Gilmer, Jaehoon Lee, Carrie Grimes Bostock, Keran Rong, Jonathan Caton, Pedram Pejman, Filip Pavetic, Geoff Brown, V. Sharma, Mario Luvci'c, Rajku-mar Samuel, J. Djolonga, Amol Mandhane, Lars Lowe Sjosund, Elena Buchatskaya, Elspeth White, Natalie Clay, Jiepu Jiang, Hyeontaek Lim, Ross Hemsley, Jane Labanowski, Nicola De Cao, David Steiner, Sayed Hadi Hashemi, Jacob Austin, Anita Gergely, Tim Blyth, Joe Stanton, K. Shivakumar, Aditya Siddhant, Anders Andreassen, Carlos L. Araya, Nikhil Sethi, Rakesh Shivanna, Steven Hand, Ankur Bapna, A. Khodaei, Antoine Miech, Garrett Tanzer, Andy Swing, S. Thakoor, Zhufeng Pan, Zachary Nado, Stephanie Winkler, Dian Yu, Mohammad Saleh, Lorenzo Maggiore, Iain Barr, Minh Giang, Thais Kagohara, Ivo Danihelka, Amit Marathe, Vladimir Feinberg, Mohamed Elhawaty, Nimesh Ghelani, Dan Horgan, Helen Miller, Lexi Walker, Richard Tanburn, Mukarram Tariq, Disha Shrivastava, Fei Xia, Chung-Cheng Chiu, Zoe Ashwood, Khuslen Baatarsukh, Sina Samangooei, Fred Alcober, Axel Stjerngren, P. Komarek, Katerina Tsihlas, Anudhyan Boral, R. Comanescu, Jeremy Chen, Ruibo Liu, Dawn Bloxwich, Charlie Chen, Yanhua Sun, Fangxi-aoyu Feng, M. Mauger, Xerxes Dotiwalla, V. Hellendoorn, Michael Sharman, Ivy Zheng, Krishna Haridasan, Gabriel Barth-Maron, Craig Swanson, Dominika Rogozi'nska, Alek Andreev, P. Rubenstein, Ruoxin Sang, Dan Hurt, Gamaleldin F. Elsayed, Ren-shen Wang, Dave Lacey, Anastasija Ili'c, Yao Zhao, Woohyun Han, L. Aroyo, Chimezie Iwuanyanwu, Vitaly Nikolaev, Balaji Lakshminarayanan, S. Jaza-yeri, Raphael Lopez Kaufman, Mani Varadarajan, Chetan Tekur, Doug Fritz, Misha Khalman, D. Reitter, Kingshuk Dasgupta, S. Sarcar, T. Ornduff, Javier Snaider, Fantine Huot, J. Jia, Rupert Kemp, Nejc Trdin, Anitha Vijayakumar, Lucy Kim, Christof Angermueller, Li Lao, Tianqi Liu, Haibin Zhang, David Engel, Somer Greene, Anais White, Jessica Austin, Lilly Taylor, Shereen Ashraf, Dangyi Liu, Maria Georgaki, Irene Cai, Yana Kulizhskaya, Sonam Goenka, Brennan Saeta, Kiran Vodrahalli, Christian Frank, D. Cesare, Brona Robenek, H. Richardson, Mah-moud Alnahlawi, Christo-pher Yew, Priya Ponnapalli, M. Tagliasacchi, Alex Korchemniy, Yelin Kim, Dinghua Li, B. Rosgen, Kyle Levin, Jeremy Wiesner, Praseem Banzal, Praveen Srinivasan, Hongkun Yu, cCauglar Unlu, David Reid, Zora Tung, D. Finchelstein, Ravin Kumar, A. Elisseeff, Jin Huang, Ming Zhang, Rui Zhu, Ricardo Aguilar, Mai Gim'enez, Jiawei Xia, Olivier Dousse, W. Gierke, S. Yeganeh, Damion Yates, Komal Jalan, Lu Li, Eri Latorre-Chimoto, D. D. Nguyen, Ken Durden, Praveen Kallakuri, Yaxin Liu, Matthew Johnson, Tomy Tsai, Alice Talbert, Jasmine Liu, Alexander Neitz, C. Elkind, M. Selvi, Mimi Jasarevic, Livio Baldini Soares, Livio Baldini Soares, Pidong Wang, A. Wang, Xinyu Ye, Krystal Kallarackal, Lucia Loher, Hoi Lam, Josef Broder, D. Holtmann-Rice, Nina Martin, Bramandia Ramadhana, Daniel Toyama, M. Shukla, Sujoy Basu, Abhi Mohan

    2024 3,552 件引用 Semantic Scholar

    In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.

  • Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit S. Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, A. Aharoni, Nathan Lintz, T. C. Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu, Toby Boyd, Brad Hekman, Aaron Parisi, Chaoyi Zhang, Kornraphop Kawintiranon, Tania Bedrax-Weiss, O. Wang, Ya Xu, Ollie Purkiss, Uri Mendlovic, Ilai Deutel, Nam Nguyen, A. Langley, Flip Korn, Lucia Rossazza, Alexandre Ram'e, Sagar M. Waghmare, Helen Miller, Vaishakh Keshava, Ying Jian, Xiaofan Zhang, R. A. Popa, Kedar Dhamdhere, Blavz Bratanivc, Kyuyeun Kim, Terry Koo, Ferran Alet, Yi-ting Chen, Arsha Nagrani, Hannah Muckenhirn, Zhiyuan Zhang, Corbin Quick, Filip Paveti'c, D. D. Nguyen, João Carreira, Michael Elabd, Ha-roon Qureshi, Fabian Mentzer, Yao Yang, Danielle Eisenbud, Anmol Gulati, Ellie Talius, Eric Ni, Sahra Ghalebikesabi, Edouard Yvinec, Alaa Saade, Thatcher Ulrich, Lorenzo Blanco, D. A. Calian, Muhua Huang, Aäron van den Oord, Naman Goyal, Terry Chen, Praynaa Rawlani, C. Schallhart, S. Lokhande, Xianghong Luo, Jyn Shan, Ceslee Montgomery, Victoria Krakovna, Federico Piccinini, Omer Barak, Jingyu Cui, Yiling Jia, Mikhail Dektiarev, A. Kolganov, Shiyu Huang, Zhe Chen, Xingyu Wang, Jessica Austin, Peter de Boursac, Evgeny Sluzhaev, F. Ding, Huijian Li, Surya Bhupatiraju, M. Agarwal, Slawek Kwasiborski, Paramjit Sandhu, Patrick Siegler, Ahmet Iscen, Eyal Ben-David, Shiraz Butt, Miltos Allamanis, Seth Benjamin, R. Busa-Fekete, Félix Hernández-Campos, S. Goldshtein, Matt Dibb, Weiyan Zhang, Annie Marsden, Carey Radebaugh, Stephen Roller, Abhishek Nayyar, Jacob Austin, Tayfun Terzi, Bhargav Kanagal Shamanna, Peter Shaw, Aayush Singh, Florian Luisier, Artur Mendoncca, V. Aggarwal, L. Markeeva, Claudio Fantacci, Sergey Brin, HyunJeong Choe, Guanyu Wang, Hartwig Adam, Avigail Dabush, T. Kiyono, E. Marcus, Jeremy R. Cole, T. Weber, Hongrae Lee, Ronny Huang, Alex Muzio, Leandro Kieliger, Maigo Le, Courtney Biles, Long Le, Archit Sharma, Chengrun Yang, Avery Lamp, Dave Dopson, N. Hurley, Katrina Xu, Zhihao Shan, Shuang Song, Jiewen Tan, Alexandre Senges, G. Zhang, Chong You, Yennie Jun, David Raposo, Susanna Ricco, Xuan Yang, Weijie Chen, Prakhar Gupta, Arthur Szlam, Kevin Villela, Chun-Sung Ferng, Daniel Kasenberg, Chen Liang, Rui Zhu, Arunachalam Narayanaswamy, Florence Perot, Paul Pucciarelli, A. Shekhawat, A. Stern, Rishikesh Ingale, Stefani Karp, Sanaz Bahargam, Adrian Goedeckemeyer, Jie Han, Sicheng Li, A. Tacchetti, Dian Yu, A. Chakladar, Zhiying Zhang, Mona Mahdy, Xu Gao, Dale S. Johnson, Samrat Phatale, A. Piergiovanni, Hyeontaek Lim, Clément Farabet, C. Lebsack, Theo Guidroz, John Blitzer, Nico Duduta, David Madras, Steve Li, D. V. Dincklage, X. Li, Mahdis Mahdieh, George Tucker, Ganesh Jawahar, Yunxuan Owen Xiao, Daniel Tarlow, Robert Geirhos, Noam Velan, Daniel Vlasic, Kalesha Bullard, SK Park, Nishesh Gupta, Kellie Webster, Ayal Hitron, Jieming Mao, J. Eisenschlos, Laurel Prince, Nina D'souza, K. Zheng, Sara Nasso, Gabriela Botea, Carl Doersch, Caglar Unlu, Chris Alberti, A. Svyatkovskiy, Ankit Goel, Krzysztof Choromanski, Pan-Pan Jiang, R. Nguyen, Four Flynn, Daria Ćurko, Peter Chen, Nicholas Roth, Kieran Milan, Caleb Habtegebriel, Shashi Narayan, M. Moffitt, Jake Marcus, T. Anthony, Brendan McMahan, Gowoon Cheon, Ruibo Liu, Megan Barnes, Lukasz Lew, Re-beca Santamaria-Fernandez, Mayank Upadhyay, Arjun Akula, A. M. Hrafnkelsson, A. Caceres, Andrew Bunner, Michal Sokolik, Subha Puttagunta, L. Moore, Berivan Isik, Jay Hartford, Lawrence Chan, P. Shenoy, D. Holtmann-Rice, Jane Park, Fabio Viola, Alexandru Salcianu, Sujeevan Rajayogam, Ian Stewart-Binks, Zelin Wu, Richard Everett, Xi Xiong, Pierre-Antoine Manzagol, Gary Leung, Carl Saroufim, Bo Pang, Dawid Wegner, G. Papamakarios, J. Palomaki, Helena Pankov, Guangda Lai, G. Tubone, Shubin Zhao, T. Strinopoulos, Seth Neel, Mingqiu Wang, Joe Kelley, Li Li, Ping-mei Xu, Anitha Vijayakumar, Andrea D'olimpio, Omer Levy, Massimo Nicosia, Grigory Rozhdestvenskiy, Ni Lao, Sirui Xie, Yash Katariya, Jon Simon, Sanjiv Kumar, Florian Hartmann, M. Kilgore, Jinhyuk Lee, Aroma Mahendru, Roman Ring, T. Hennigan, Fiona Lang, Colin Cherry, David Steiner, Dawsen Hwang, Ray Smith, Pidong Wang, Jeremy Chen, Ming Yang, S. Kwei, Philippe Schlattner, Donnie Kim, Ganesh Poomal Girirajan, Nikola Momchev, Ayushi Agarwal, Xingyi Zhou, Ilkin Safarli, Zachary Garrett, AJ Pierigiovanni, Sarthak Jauhari, Alif Raditya Rochman, Shikhar Vashishth, Quan Yuan, Christof Angermueller, Jon Blanton, Xiny-ing Song, N. B. Gundavarapu, Thi Avrahami, Maxine Deines, Subhrajit Roy, Manish Gupta, Christopher Semturs, Shobha Vasudevan, Aditya Srikanth Veerubhotla, Shriya Sharma, Joshy Jacob, Zhen Yang, Andreas Terzis, Dan Karliner, Auriel Wright, Tania Rojas-Esponda, Ashley Brown, A. Roy, Pawan Dogra, A. Kapishnikov, Peter Young, W. Kan, Vinodh K. Rajendran, M. Ivanova, S. Deshmukh, Chia-Hua Ho, Michael Kwong, S. Ginzburg, Annie Louis, KP Sawhney, Slav Petrov, Jing Xie, Yunfei Bai, G. Stoyanov, Alex Fabrikant, Rajesh Jayaram, Yuqi Li, Joseph Heyward, Justin Gilmer, Yaqing Wang, Radu Soricut, Lu Liu, Qing-ping Duan, Jamie Hayes, Maura O'Brien, Gau-rav Singh Tomar, Sivan Eiger, Bahare Fatemi, Jeffrey Hui, Catarina Barros, A. Chukwuka, Alena Butryna, Saksham Thakur, Austin Huang, Zhufeng Pan, Haotian Tang, Serkan Cabi, Tulsee Doshi, Michiel A. Bakker, S. Bagri, Ruy Ley-Wild, Á. Lelkes, J. Lees, P. Kane, David Greene, Shimu Wu, J. Bornschein, G. Surita, Sarah Hodkinson, Fangtao Li, Chris Hidey, Sébastien Pereira, Sean Ammirati, Phillip Lippe, Adam Kraft, Pu Han, Sebastian Gerlach, Zifeng Wang, Liviu Panait, Feng Han, B. Farris, Y. Bi, Hannah DeBalsi, Miaosen Wang, Gladys Tyen, James Cohan, Susan Zhang, Jarred Barber, D. Chung, Jaeyoung Kim, M. Kunesch, S. Pecht, Nami Akazawa, Abe Friesen, James Lyon, Ali Eslami, Junru Wu, Jiewen Tan, Yue Song, Ravin Kumar, Christoper A. Welty, Ilia Akolzin, Gena Gibson, Sean Augenstein, Arjun Pillai, N. Yuen, Du Phan, Xin Wang, Iain Barr, H. Zen, Nan Hua, Casper Liu, Jilei Wang, T. Bhatia, Hao Xu, Oded Elyada, Pushmeet Kohli, Mirek Olvs'ak, Kelly Chen, Azalia Mirhoseini, Noam Shazeer, Shoshana Jakobovits, Maggie Tran, Nolan Ramsden, T. Bharti, Fred Alcober, Yunjie Li, S. Shetty, Jing Chen, Dmitry Kalashnikov, Megha Nawhal, Sercan Ö. Arik, Hanwen Chen, M. Blokzijl, Shubham Gupta, J. Rubin, Rigel Swavely, Sophie Bridgers, I. Gemp, Chenlin Su, A. Suggala, Juliette Pluto, Mary Cassin, Alain C. Vaucher, Kaiyang Ji, J. Cai, Andrew Audibert, Animesh Sinha, David Tian, E. Farkash, Amy Hua, Jilin Chen, Duc Tran, E. Loper, Nicole Brichtova, Lara McConnaughey, Ballie Sandhu, Robert Leland, Douglas DeCarlo, A. Over, J. Huang, Xing Wu, C. Fan, Eric Li, Yun-Peng Lei, Deepak Sharma, Cosmin Paduraru, Luo Yu, Matko Bovsnjak, Phuong Dao, Min Choi, Sneha Kudugunta, Jakub Adamek, Carlos Gu'ia, Ali Khodaei, Jie Feng, Wenjun Zeng, David Welling, Sandeep Tata, Christina Butterfield

    2025 2,908 件引用 Semantic Scholar

    In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.

  • Gemma: Open Models Based on Gemini Research and Technology

    Gemma Team Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, L. Sifre, Morgane Rivière, Mihir Kale, J Christopher Love, P. Tafti, L'eonard Hussenot, A. Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Am'elie H'eliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari, Charline Le Lan, Christopher A. Choquette-Choo, Clé-ment Crepy, Daniel Cer, Daphne Ippolito, David Reid, Elena Buchatskaya, Eric Ni, Eric Noland, Geng Yan, George Tucker, George-Christian Muraru, Grigory Rozhdestvenskiy, H. Michalewski, Ian Tenney, Ivan Grishchenko, Jacob Austin, James Keeling, Jane Labanowski, Jean-Baptiste Lespiau, J. Stanway, Jenny Brennan, Jeremy Chen, Johan Ferret, Justin Chiu, J. Mao-Jones, Kather-ine Lee, Kathy Yu, Katie Millican, Lars Lowe Sjoesund, Lisa Lee, Lucas Dixon, Machel Reid, Maciej Mikuła, Mateo Wirth, Michael Sharman, Nikolai Chinaev, Nithum Thain, Olivier Bachem, Os-car Chang, Oscar Wahltinez, Paige Bailey, Paul Michel, Petko Yotov, Pier Giuseppe Sessa, R. Chaabouni, R. Comanescu, Reena Jana, Rohan Anil, Ross Mcilroy, Ruibo Liu, Ryan Mullins, Samuel L. Smith, Sebastian Borgeaud, Sertan Girgin, Sholto Douglas, Shree Pandya, Siamak Shakeri, Soham De, Ted Klimenko, T. Hennigan, Vladimir Feinberg, Wojciech Stokowiec, Yu-Hui Chen, Zafarali Ahmed, Zhitao Gong, Tris Warkentin, Ludovic Peran, Minh Giang, Clément Farabet, O. Vinyals, Jeffrey Dean, K. Kavukcuoglu, D. Hassabis, Z. Ghahramani, Douglas Eck, Joelle Barral, Fernando Pereira, Eli Collins, Armand Joulin, Noah Fiedel, Evan Senter, Alek Andreev, Kathleen Kenealy

    2024 1,049 件引用 Semantic Scholar

    This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations.

  • Mini-Gemini: Mining the Potential of Multi-Modality Vision Language Models

    Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia

    2024 377 件引用 Semantic Scholar

    In this work, we introduce Mini-Gemini, a simple and effective framework enhancing multi-modality Vision Language Models (VLMs). Despite the advancements in VLMs facilitating basic visual dialog and reasoning, a performance gap persists compared to advanced models like GPT-4 and Gemini. We propose a novel approach to narrow the gap by mining the potential of VLMs for better performance across various cross-modal tasks. It tackles the following questions: (1) How can high-resolution visual tokens improve image understanding without lengthening the token sequence? (2) How to improve reasoning and generation abilities of VLM with high-quality data? (3) How to close the gap between open-source VLMs and proprietary models on reasoning-driven generation? In particular, to enhance visual tokens, we propose to utilize an additional visual encoder for high-resolution refinement without increasing the visual token count. We further construct a high-quality dataset that promotes precise image comprehension and reasoning-based generation, expanding the operational scope of current VLMs. In general, Mini-Gemini further mines the potential of VLMs and empowers current frameworks with image understanding, reasoning, and generation simultaneously. The proposed model supports a series of dense and MoE Large Language Models (LLMs) from 2B to 34B, which achieve leading performance in several zero-shot benchmarks and even surpasses the developed private models. It is demonstrated to attain 80.6% accuracy on the MMB benchmark (+5.4 vs Gemini Pro) and 74.1% on TextVQA (+4.6 vs LLaVA-NeXT), achieving leading performance in several zero-shot benchmarks and even surpasses the developed private models. Furthermore, Mini-Gemini is proven to improve consistently with stronger LLM, visual encoder, and data in experiments.

  • Capabilities of Gemini Models in Medicine

    Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, E. Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya B Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Lu Liu, Jean-Baptiste Alayrac, N. Houlsby, Nenad Tomašev, Jan Freyberg, Charles Lau, Jonas Kemp, J. Lai, Shekoofeh Azizi, K. Kanada, SiWai Man, Kavita Kulkarni, Ruoxi Sun, Siamak Shakeri, Luheng He, Ben Caine, Albert Webson, Natasha Latysheva, Melvin Johnson, P. Mansfield, Jian Lu, E. Rivlin, Jesper Anderson, Bradley Green, Renee Wong, Jonathan Krause, J. Shlens, Ewa Dominowska, S. Eslami, Claire Cui, O. Vinyals, K. Kavukcuoglu, J. Manyika, Jeff Dean, D. Hassabis, Yossi Matias, D. Webster, Joelle K. Barral, Gregory S. Corrado, Christopher Semturs, S. Mahdavi, Juraj Gottweis, A. Karthikesalingam, Vivek Natarajan

    2024 362 件引用 Semantic Scholar

    Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health&medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain.

Mentioned Articles

20 件

External Mentions

10 件