Learn how multimodal AI merges text, image, and audio for smarter models
-
Updated
Jan 21, 2025 - Jupyter Notebook
Learn how multimodal AI merges text, image, and audio for smarter models
Neocortex Unity SDK for Smart NPCs and Virtual Assistants
Enterprise-ready solution leveraging multimodal Generative AI (Gen AI) to enhance existing or new applications beyond text—implementing RAG, image classification, video analysis, and advanced image embeddings.
AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.
VLDBench: A large-scale benchmark for evaluating Vision-Language Models (VLMs) and Large Language Models (LLMs) on multimodal disinformation detection.
AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.
Gallery showcasing AI-generated images and videos created using the Nova model
ChatGPT said: Generative AI (Gen AI) is a branch of artificial intelligence that creates new content such as text, images, audio, or code using models like GPT or Gemini. It powers applications like AI chatbots, image generation tools, and creative assistants across various industries.
Lab website
Al-Asma''i: The Digital Poet is an innovative AI model. When you give it the name of an Arabic poem, it creates the poem's text and converts it into visual and audio presentations. Combining AI with traditional Arabic poetry, it offers a unique experience where users can see and hear their favorite poems in a new and expressive way.
ge generation) with support for voice message processing, user authorization, and admin controls.
The teaches you to integrate text, images, and videos into applications using Gemini's state-of-the-art multimodal models. Learn advanced prompting techniques, cross-modal reasoning, and how to extend Gemini's capabilities with real-time data and API integration.
🤖🤖 Gemini-Powered AI Chatbot 🤖🤖This is a Streamlit-based AI chatbot powered by Google Gemini models (1.5 Pro & 1.5 Flash). The chatbot supports both text and image input, making it capable of handling multimodal queries. It's perfect for experimenting with Google's generative AI capabilities through a clean, interactive web interface.
A basic application that uses Google's Gemini AI to automatically capture, analyze, and answer quiz questions from screenshots in real-time. Current setup is made for MacOS. Needs further testing.
Scalable multimodal AI system combining FSDP, RLHF, and Inferentia optimization for customer insights generation.
Add a description, image, and links to the multimodal-ai topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-ai topic, visit your repo's landing page and select "manage topics."