Most of my blog posts are usually about small and niche tools by indie builders or small startups.
Today I wanna talk about a race of giants that’s been happening for a while now. Unified Multimodal LLMs are turning heads in the AI scene.
Well, in tech in general.
These next-gen AI apps and tools will be able to do some crazy and wild things. Turn a doodle into a fully functioning website or instantly interpret complex charts and analyses.
With all of the buzz around Google’s Gemini, a new ChatGPT version or Meta’s AI, I think it’s fair to say:
“Brace yourself, The next chapter of AI is coming!”
How Did It All Start?
Less than a year ago, OpenAI introduced the world to ChatGPT, marking a pivotal moment in the evolution of Large Language Model AI (LLM AI).
This spark of innovation caught fire, with major companies seeing the potential and investing billions to advance this technology.
They’re not just staking money, but their future, on developing the next big tool. The goal is a new kind of AI that combines chatbots and search features in one powerful tool.
What are Unified Multimodal LLMs?
Unified Multimodal LLMs represent the next generation of AI models, able to handle both text and visual data seamlessly.
Instead of jumping between platforms for different tasks – like coding a website, generating an image, recording voice-to-text, or analyzing a chart – these models combine it all, letting users perform diverse functions from one central hub.
A taste of this was given by OpenAI’s President, Greg Brockman, who showcased how a simple photo of handwritten notes could be transformed into a working, joke-telling website using GPT-4.
How or does this in any way affect You?
If you are already using multiple AI tools, this can be very interesting. It brings convenience and accessibility.
I can already see digital marketing departments, or teachers make bulk operations even faster.
No longer will you need to seek out a tech-savvy friend or colleague to translate a complex chart in a simple way. The model will analyze and explain it for you.
Building a website? Just sketch your idea, and let the AI turn it into reality.
Companies like Google, OpenAI/Microsoft, and Meta are racing to bring these functionalities to our fingertips.
With tech giants pouring resources into developing these models further, we can expect more integrated, user-friendly, and advanced AI tools in our daily digital experiences very soon.
The road ahead for AI is exciting.
Even overwhelming in moments, but definitely interesting and crazy.
With Unified Multimodal LLMs around the corner, the way we interact with technology is set to be more seamless and intuitive than ever.
A quick look into what this might look like? From a hacker perspective 🙂