Skip to content
#

multimodal-ai

Here are 15 public repositories matching this topic...

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

  • Updated May 13, 2025
  • Python

ChatGPT said: Generative AI (Gen AI) is a branch of artificial intelligence that creates new content such as text, images, audio, or code using models like GPT or Gemini. It powers applications like AI chatbots, image generation tools, and creative assistants across various industries.

  • Updated Apr 23, 2025
  • Jupyter Notebook

Al-Asma''i: The Digital Poet is an innovative AI model. When you give it the name of an Arabic poem, it creates the poem's text and converts it into visual and audio presentations. Combining AI with traditional Arabic poetry, it offers a unique experience where users can see and hear their favorite poems in a new and expressive way.

  • Updated Jun 16, 2024
  • Python

The teaches you to integrate text, images, and videos into applications using Gemini's state-of-the-art multimodal models. Learn advanced prompting techniques, cross-modal reasoning, and how to extend Gemini's capabilities with real-time data and API integration.

  • Updated Sep 2, 2024

🤖🤖 Gemini-Powered AI Chatbot 🤖🤖This is a Streamlit-based AI chatbot powered by Google Gemini models (1.5 Pro & 1.5 Flash). The chatbot supports both text and image input, making it capable of handling multimodal queries. It's perfect for experimenting with Google's generative AI capabilities through a clean, interactive web interface.

  • Updated Apr 18, 2025
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the multimodal-ai topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-ai topic, visit your repo's landing page and select "manage topics."

Learn more