AI Anecdotes — Introducing Gemini
Google has announced it’s groundbreaking large language model Gemini, marking a significant leap forward in the field of AI, unlocking unprecedented opportunities and paving the way for exciting possibilities. The model is adaptable on both cloud and mobile services, with its state-of-`the-art capabilities enhancing developers and enterprising customers create and expand their AI capabilities. Traditional creation of multimodal models happens by integration of text-only, vision-only, and audio-only models in a suboptimal way at a secondary stage. Gemini, as defined by Google’s most capable AI model breaks through limitations, seamlessly processing information across text, code, audio, images, and video, ultimately providing the best possible response.
Researchers at Google DeepMind tested Gemini under 50 different subject areas, and the model proved to perform on par with human experts in each of these domains. Gemini Ultra scored 90.0% on MMLU (Massive Multitask Language Understanding), which consists of a diverse range of 57 subjects, including math, physics, history, law, medicine, and ethics, rigorously testing both global knowledge and problem-solving capabilities. For instance, Gemini can create connections between objects in an image or a video, or even look into errors of a student’s response to a complex problem in calculus.
Gemini 1.0, the initial version of the revolutionary AI model, comes in three sizes :
- Gemini Ultra — Largest and most capable model designed for highly complex tasks (available early next year)
- Gemini Pro — Performing model for a broad range of tasks (integrated with Google Cloud Vertex AI, available December 13, 2023 for developers)
- Gemini Nano — Efficient model for on-device tasks (available December 6, 2023 for developers)
Demis Hassabis, CEO Google DeepMind envisions Gemini as a stepping stone to a truly universal AI model. He says, “Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research. It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video.”
Alexander Chan, Creative Director explains several experiments of multimodal prompting with Gemini, where he mentions that Gemini will be out for people to try in Google AI Studio exploring what’s possible with the AI model.
View highlights from Google’s Hands-on experience with Gemini : https://youtu.be/UIZAiXYceBI?si=IL9BG6CY1yoKSwPH