Amazon's Alexa powered smart glasses 🤓

PLUS: DALL-E 3 gets an upgrade

Above the Fold 🔺

Has humanity arrived at the dawn of a new age? A developer asked AI to name and launch a Memecoin and it raked In $12 million In one day. ‘We’ve hit peak efficiency’ with legacy tech, says Coinbase exec. It's 2030, and digital wallets have replaced every card in our purses and pockets. Real-world assets: Shaping the future of digital asset management with tokenization. Cruise CEO says winter version of Origin AV is two years away. OpenAI’s Ilya Sutskever calls deep learning ‘alchemy’.

Must-reads 📰

  1. OpenAI unveils DALL-E 3, allows artists to opt out of training (TechCrunch) 5 minute read

  2. Nvidia Partners with Mercedes-Benz to Create Digital Twins for Efficient Manufacturing (VentureBeat) 15 minute read

  3. Amazon breathes new life into Alexa with an AI upgrade, aiming for more human-like interactions on devices like the Echo Show 8. (Barron’s) 4 minute read

  4. Carrera 'smart' glasses set for US launch by Safilo and Amazon (Reuters) 2 minute read

  5. The Best Crypto Parties Are Happening Overseas Now (Bloomberg) 3 minute read

Startups & Fund Raising 🦄 

  1. Former Meta AI VP Launches Sizzle, a Revolutionary AI-Powered Learning App (TechCrunch) 5 minute read

  2. Mesh, which helps people manage their digital assets, raises $22M (TechCrunch) 5 minute read

  3. KYP.ai raises $18.7M from Europe’s leading deeptech VCs (VentureBeat) 7 minute read

Product Launches 🚀

  1. Amazon introduces a whole all-new lineup of smart glasses powered by Alexa (Amazon) 10 minute read

  2. GitHub’s AI-powered coding chatbot is now available for individuals (The Verge) 3 minute read

  3. OpenAI unveils DALL-E 3, the latest image generator capable of adding coherent text to images, stepping up the game in synthetic media creation (The Verge) 4 minute read

  4. High-end VR headset Varjo Aero is now available for half the price (Mixed News) 4 minute read

Research 🔬

The paper introduces DreamLLM, a groundbreaking learning framework for developing Multimodal Large Language Models (MLLMs) capable of understanding and generating both language and images, emphasizing the often-neglected synergy between multimodal comprehension and creation. DreamLLM is innovative in that it can operate in the raw multimodal space, creating more holistic multimodal understandings by directly sampling language and image posteriors, bypassing the restrictions and information loss of external feature extractors like CLIP.

Why it matters: The development of a versatile MLLM like DreamLLM is crucial as it enhances the synergy between different modes of comprehension and creation, allowing for a richer, more integrated understanding and generation of content. This advancement holds substantial promise for numerous applications, including more seamless and intelligent user interactions and more nuanced content creation in both textual and visual domains.

How it works: DreamLLM operates based on two core principles. Firstly, it employs generative modeling in the raw multimodal space, enabling the direct sampling of both language and image posteriors. This method allows for a deeper, more integrated understanding of multimodal content, overcoming the limitations of existing external feature extractors. Secondly, DreamLLM encourages the generation of raw, interleaved documents, effectively learning all conditional, marginal, and joint multimodal distributions and is capable of producing free-form interleaved content.

Between the lines: The capability of DreamLLM to function as a zero-shot multimodal generalist, with superior performance in both understanding and generating content in various forms, underscores the potential transformational impacts of such models in diverse domains. It implies a move towards more coherent, interconnected multimodal learning models that can more effectively mimic human-like comprehension and generation of both textual and visual information.

The bottom line: DreamLLM stands out as a pioneering framework in the development of multimodal large language models, emphasizing the synergy between different modes of understanding and creation. Its ability to operate in raw multimodal spaces and produce interleaved content could lead to substantial advancements in AI, offering more sophisticated and integrated solutions in fields requiring a cohesive approach to textual and visual information.

Quick Bytes ⚡️

TOGETHER WITH METALYST

Hit the inbox of readers from Apple, Meta, Unity and more

Advertise with MetaLyst to get your brand or startup in front of the Who's Who of metaverse tech. Our readers are folks who love tech stuff – they create things, and invest in cool ideas. Your product could be their next favorite thing! Get in touch today.

Comments, questions, tips?

Send a letter to the editor –– Email  or tweet us

Was this newsletter forwarded to you, and you’d like to see more?

Reply

or to participate.