multimodal-ai

Here are 15 public repositories matching this topic...

sinanuozdemir / oreilly-multimodal-ai

Learn how multimodal AI merges text, image, and audio for smarter models

openai diffusion multimodal deepgram livekit stable-diffusion dreambooth generative-ai llava dalle-3 llama3 multimodal-ai

Updated Jan 21, 2025
Jupyter Notebook

neocortex-link / neocortex-unity-sdk

Star

Neocortex Unity SDK for Smart NPCs and Virtual Assistants

ai game-development npc npcs game-ai ai-agents conversational-ai smart-agent ai-tools ai-agent aiagent smart-agents aiagents multimodal-ai smart-npc smart-npcs

Updated Apr 2, 2025
C#

microsoft / multimodal-ai

Star

Enterprise-ready solution leveraging multimodal Generative AI (Gen AI) to enhance existing or new applications beyond text—implementing RAG, image classification, video analysis, and advanced image embeddings.

python ai azure video-analysis azure-ai enterprise-ai multimodal-ai

Updated Apr 23, 2025
HCL

alperensumeroglu / ai-clips-maker

Star

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

Updated Apr 2, 2025
Python

VectorInstitute / VLDBench

Star

VLDBench: A large-scale benchmark for evaluating Vision-Language Models (VLMs) and Large Language Models (LLMs) on multimodal disinformation detection.

nlp benchmarking machine-learning computer-vision deep-learning datasets benchmark-framework ai-safety llm vlms vision-language-models multimodal-ai disinformation-detection

Updated Apr 21, 2025
Python

Livyatan-melvillei / ai-clips-maker

Star

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

automatic-speech-recognition media-processing temporal-segmentation ml-pipeline ffmpeg-python deep-learning-pipelines video-scene-detection video-transcription huggingface-pipelines multimodal-ai video-resizing ai-video-summarization video-clip-generation intelligent-video-editing

Updated May 13, 2025
Python

hi-space / amazon-bedrock-nova-gallery

Star

Gallery showcasing AI-generated images and videos created using the Nova model

bedrock text-to-image image-to-video text-to-video generative-ai multimodal-ai

Updated Feb 1, 2025
Python

Md-Emon-Hasan / Gen-AI-on-going

Star

ChatGPT said: Generative AI (Gen AI) is a branch of artificial intelligence that creates new content such as text, images, audio, or code using models like GPT or Gemini. It powers applications like AI chatbots, image generation tools, and creative assistants across various industries.

Updated Apr 23, 2025
Jupyter Notebook

mims-harvard / mims-harvard.github.io

Star

Lab website

therapeutics generative-ai biomedical-ai agentic-ai multimodal-ai

Updated May 6, 2025
HTML

alwalid54321 / Al-Asmaai

Star

Al-Asma''i: The Digital Poet is an innovative AI model. When you give it the name of an Arabic poem, it creates the poem's text and converts it into visual and audio presentations. Combining AI with traditional Arabic poetry, it offers a unique experience where users can see and hear their favorite poems in a new and expressive way.

machine-learning natural-language-processing text-to-speech ai deep-learning artificial-intelligence generative-model generative-art visualizations language-model text-to-image audio-visual machine-learning-art ai-art text-to-video multimodal-ai ai-visualization

Updated Jun 16, 2024
Python

alwalid54321 / gemini-telegram-bot

Star

ge generation) with support for voice message processing, user authorization, and admin controls.

telegram-bot voice-recognition image-generation python-bot ai-chatbot authorization-system generative-ai google-generative-ai gemini-ai telegram-ai-bot multimodal-ai

Updated Apr 15, 2025

ksm26 / Large-Multimodal-Model-Prompting-with-Gemini

Star

The teaches you to integrate text, images, and videos into applications using Gemini's state-of-the-art multimodal models. Learn advanced prompting techniques, cross-modal reasoning, and how to extend Gemini's capabilities with real-time data and API integration.

semantic-search video-qa api-integration prompt-engineering function-calling gemini-models multimodal-ai text-image-video-integration cross-modal-reasoning content-summarization virtual-interior-design

Updated Sep 2, 2024

shubhamt43 / PythonLearnings

Star

🤖🤖 Gemini-Powered AI Chatbot 🤖🤖This is a Streamlit-based AI chatbot powered by Google Gemini models (1.5 Pro & 1.5 Flash). The chatbot supports both text and image input, making it capable of handling multimodal queries. It's perfect for experimenting with Google's generative AI capabilities through a clean, interactive web interface.

python chatbot image-to-text mini-project gemini-api streamlit ai-chatbot streamlit-cloud google-generative-ai google-gemini multimodal-ai

Updated Apr 18, 2025
Jupyter Notebook

dolphinium / ss2llm

Star

A basic application that uses Google's Gemini AI to automatically capture, analyze, and answer quiz questions from screenshots in real-time. Current setup is made for MacOS. Needs further testing.

python macos automation ai-assistant screenshot-detection google-gemini-ai multimodal-ai

Updated Mar 26, 2025
Python

fereydoonboroojerdi / multimodal-customer-insights-generator

Star

Scalable multimodal AI system combining FSDP, RLHF, and Inferentia optimization for customer insights generation.

aws deep-learning pytorch customer-insights sagemaker inferentia rlhf fsdp multimodal-ai

Updated May 3, 2025
Python

Improve this page

Add a description, image, and links to the multimodal-ai topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-ai topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal-ai

Here are 15 public repositories matching this topic...

sinanuozdemir / oreilly-multimodal-ai

neocortex-link / neocortex-unity-sdk

microsoft / multimodal-ai

alperensumeroglu / ai-clips-maker

VectorInstitute / VLDBench

Livyatan-melvillei / ai-clips-maker

hi-space / amazon-bedrock-nova-gallery

Md-Emon-Hasan / Gen-AI-on-going

mims-harvard / mims-harvard.github.io

alwalid54321 / Al-Asmaai

alwalid54321 / gemini-telegram-bot

ksm26 / Large-Multimodal-Model-Prompting-with-Gemini

shubhamt43 / PythonLearnings

dolphinium / ss2llm

fereydoonboroojerdi / multimodal-customer-insights-generator

Improve this page

Add this topic to your repo