Category: AI
-
What I learned from ChatGPT-40’s launch event

ChatGPT-4o is here. The “o” stands for omni, hinting at the combination of text, audio, video and image outputs that the latest version of ChatGPT offers. OpenAI CEO Sam Altman posted on X that GPT-4o is “natively multimodal” with its native combination of voice, text and vision. At the launch livestream of GPT-4o earlier this…
-
ElevenLabs (Product Review)

My summary of Eleven Labs before using it – ElevenLabs turns text into voice. How does ElevenLabs explain itself in the first minute? On its website ElevenLabs explains that its users can create natural AI voices instantly in any language. How does ElevenLabs work? I select a language – English – and sample text is…
-
Suno AI (Product Review)

My summary of Suno AI before using it – Creating your own music through generative AI. How does Suno AI explain itself in the first minute? “Make a song with Suno”, followed by a CTA to make a song. Suno’s website also mentions that its V3 is out now, enabling users to “create full, two-minute…
-
What is Retrieval-Augmented Generation (RAG)?

As we’re getting more used to Large Language Models (LLMs) and their applications, we’re also starting to see the gaps in the accuracy and reliability of the responses that we get from these models. LLMs can start hallucinating, which means that they provide a response that might seem accurate at first glance, but isn’t. One…
-
Learning about video and generative AI (2)

Last week I wrote about the arrival of Sora and this week I’ll cover two new contributions to the field of video and AI: Meta’s “V-JEPA” method and Alibaba’s “EMO” model. EMO With EMO (Emote Portrait Alive – why wasn’t it called “EPO”?!), users can create singing or talking videos, based off a static image…
