Gemini 1.5 Pro Now Listens to Audio and Is Available to All

COMBOFRE April 10, 2024

On the Google Cloud Subsequent 2024 occasion in Las Vegas, Google introduced that it’s going to make Gemini 1.5 Professional usually obtainable to all customers. The highly-anticipated mannequin is lastly in public preview with a 1 million context window, and also you not have to enroll for the waitlist to entry the Gemini 1.5 Professional mannequin.

I attempted to entry the Gemini 1.5 Professional mannequin from a brand new Google account and the mannequin was available with none wait. And all that is obtainable free of charge.

google ai studio with the gemini 1.5 pro model

That stated, it doesn’t imply you can begin utilizing the Gemini 1.5 Professional mannequin on the Gemini portal. You’ll have to head to aistudio.google.com (visit) to entry the mannequin at the moment. After just a few months of public preview, the mannequin shall be made obtainable on the Gemini portal. You’ll possible want a Gemini Superior subscription to make use of the mannequin.

Understand that the Gemini 1.5 Professional mannequin is a mid-tier mannequin constructed on the MoE structure, nevertheless, it beats the most important Gemini 1.0 Extremely mannequin simply. And in our comparability with the GPT-4 mannequin, Gemini 1.5 Professional confirmed outstanding capabilities in a number of checks. When Gemini 1.5 Professional debuts on the Gemini portal, anticipate it to carry out higher than GPT-4 and Claude 3’s Opus mannequin.

Really helpful Articles

Claude 3 Opus vs GPT-4 vs Gemini 1.5 Professional AI Fashions Examined

Arjun Sha

Mar 6, 2024

Other than that, Gemini 1.5 Professional can now course of audio recordsdata too. You possibly can add audio recordsdata of conferences or movies, and the mannequin can hearken to the uploaded recordsdata with out the necessity to manually generate a transcript. It may be of immense assist to individuals who need to discover fast and structured info from audio conferences or discussions.

Gemini 1.5 Professional might already course of movies and pictures, and now audio recordsdata are supported too which makes it a strong multimodal mannequin with a context size of 1 million tokens. We examined the audio processing functionality of the Gemini 1.5 Professional mannequin. Right here is the way it went.

Tips on how to Course of Audio Recordsdata on Gemini 1.5 Professional

Head over to aistudio.google.com (visit) in a browser.
Subsequent, be sure that the “Gemini 1.5 Professional” mannequin is chosen within the drop-down menu.

After that, click on on the “Audio” menu within the high row and add your audio file. It helps these audio file codecs: FLAC, MIDI, MP3, M4A, OPUS, OGG, OGA, WAV, and MID.

It can course of the audio file and eat tokens.
Now, begin asking your questions, and Gemini 1.5 Professional will discover the knowledge from the audio and reply accordingly.

One of the best half is that it generates the transcript in a structured format with labels of various audio system. And it doesn’t hallucinate in any respect.

gemini 1.5 pro generating audio transcript

So that is how one can add and course of audio recordsdata on Gemini 1.5 Professional. It’s actually a strong mannequin from the Google DeepMind group and I’m excited that it’s now obtainable to the general public at giant with none value. Go forward and take a look at it and tell us your ideas within the remark part under.

COMBOFRE April 10, 2024