- Spatial Tech Review
- Posts
- Amazon's Alexa powered smart glasses 🤓
Amazon's Alexa powered smart glasses 🤓
PLUS: DALL-E 3 gets an upgrade
Above the Fold 🔺
Has humanity arrived at the dawn of a new age? A developer asked AI to name and launch a Memecoin and it raked In $12 million In one day. ‘We’ve hit peak efficiency’ with legacy tech, says Coinbase exec. It's 2030, and digital wallets have replaced every card in our purses and pockets. Real-world assets: Shaping the future of digital asset management with tokenization. Cruise CEO says winter version of Origin AV is two years away. OpenAI’s Ilya Sutskever calls deep learning ‘alchemy’.
Must-reads 📰
OpenAI unveils DALL-E 3, allows artists to opt out of training (TechCrunch) 5 minute read
Nvidia Partners with Mercedes-Benz to Create Digital Twins for Efficient Manufacturing (VentureBeat) 15 minute read
Amazon breathes new life into Alexa with an AI upgrade, aiming for more human-like interactions on devices like the Echo Show 8. (Barron’s) 4 minute read
Carrera 'smart' glasses set for US launch by Safilo and Amazon (Reuters) 2 minute read
The Best Crypto Parties Are Happening Overseas Now (Bloomberg) 3 minute read
Startups & Fund Raising 🦄
Former Meta AI VP Launches Sizzle, a Revolutionary AI-Powered Learning App (TechCrunch) 5 minute read
Mesh, which helps people manage their digital assets, raises $22M (TechCrunch) 5 minute read
KYP.ai raises $18.7M from Europe’s leading deeptech VCs (VentureBeat) 7 minute read
Product Launches 🚀
Amazon introduces a whole all-new lineup of smart glasses powered by Alexa (Amazon) 10 minute read
GitHub’s AI-powered coding chatbot is now available for individuals (The Verge) 3 minute read
OpenAI unveils DALL-E 3, the latest image generator capable of adding coherent text to images, stepping up the game in synthetic media creation (The Verge) 4 minute read
High-end VR headset Varjo Aero is now available for half the price (Mixed News) 4 minute read
Research 🔬
The paper introduces DreamLLM, a groundbreaking learning framework for developing Multimodal Large Language Models (MLLMs) capable of understanding and generating both language and images, emphasizing the often-neglected synergy between multimodal comprehension and creation. DreamLLM is innovative in that it can operate in the raw multimodal space, creating more holistic multimodal understandings by directly sampling language and image posteriors, bypassing the restrictions and information loss of external feature extractors like CLIP.
Why it matters: The development of a versatile MLLM like DreamLLM is crucial as it enhances the synergy between different modes of comprehension and creation, allowing for a richer, more integrated understanding and generation of content. This advancement holds substantial promise for numerous applications, including more seamless and intelligent user interactions and more nuanced content creation in both textual and visual domains.
How it works: DreamLLM operates based on two core principles. Firstly, it employs generative modeling in the raw multimodal space, enabling the direct sampling of both language and image posteriors. This method allows for a deeper, more integrated understanding of multimodal content, overcoming the limitations of existing external feature extractors. Secondly, DreamLLM encourages the generation of raw, interleaved documents, effectively learning all conditional, marginal, and joint multimodal distributions and is capable of producing free-form interleaved content.
Between the lines: The capability of DreamLLM to function as a zero-shot multimodal generalist, with superior performance in both understanding and generating content in various forms, underscores the potential transformational impacts of such models in diverse domains. It implies a move towards more coherent, interconnected multimodal learning models that can more effectively mimic human-like comprehension and generation of both textual and visual information.
The bottom line: DreamLLM stands out as a pioneering framework in the development of multimodal large language models, emphasizing the synergy between different modes of understanding and creation. Its ability to operate in raw multimodal spaces and produce interleaved content could lead to substantial advancements in AI, offering more sophisticated and integrated solutions in fields requiring a cohesive approach to textual and visual information.
Quick Bytes ⚡️
Must-read: Quantum supremacy explained
Design: Recreating the VisionOS UI on Quest2 with the Unity’s XR Toolkit
Opinion: Water & Music’s Cherie Hu says Web3 and AI will revolutionize creativity
Watch: Leaked video gives glimpse of Meta Quest 3's passthrough quality
TOGETHER WITH METALYST
Hit the inbox of readers from Apple, Meta, Unity and more
Advertise with MetaLyst to get your brand or startup in front of the Who's Who of metaverse tech. Our readers are folks who love tech stuff – they create things, and invest in cool ideas. Your product could be their next favorite thing! Get in touch today.
Comments, questions, tips?
Was this newsletter forwarded to you, and you’d like to see more?
Reply