langchain-ai
diff --git a/‎docs/docs/concepts/multimodality.mdx
+65-15 b/‎docs/docs/concepts/multimodality.mdx
+65-15
diff --git a/‎docs/docs/how_to/index.mdx
+2 b/‎docs/docs/how_to/index.mdx
+2
@@ -15,7 +15,10 @@
 * [Messages](/docs/concepts/messages)
 :::
 
-Multimodal support is still relatively new and less common, model providers have not yet standardized on the "best" way to define the API. As such, LangChain's multimodal abstractions are lightweight and flexible, designed to accommodate different model providers' APIs and interaction patterns, but are **not** standardized across models.
+LangChain supports multimodal data as input to chat models:
+
+1. Following provider-specific formats
+2. Adhering to a cross-provider standard (see [how-to guides](/docs/how_to/#multimodal) for detail)
 
 ### How to use multimodal models
 
@@ -26,38 +29,85 @@ Multimodal support is still relatively new and less common, model providers have
 
 #### Inputs
 
-Some models can accept multimodal inputs, such as images, audio, video, or files. The types of multimodal inputs supported depend on the model provider. For instance, [Google's Gemini](/docs/integrations/chat/google_generative_ai/) supports documents like PDFs as inputs.
+Some models can accept multimodal inputs, such as images, audio, video, or files.
+The types of multimodal inputs supported depend on the model provider. For instance,
+[OpenAI](/docs/integrations/chat/openai/),
+[Anthropic](/docs/integrations/chat/anthropic/), and
+[Google Gemini](/docs/integrations/chat/google_generative_ai/)
+support documents like PDFs as inputs.
+
+The gist of passing multimodal inputs to a chat model is to use content blocks that
+specify a type and corresponding data. For example, to pass an image to a chat model
+as URL:
 
-Most chat models that support **multimodal inputs** also accept those values in OpenAI's content blocks format. So far this is restricted to image inputs. For models like Gemini which support video and other bytes input, the APIs also support the native, model-specific representations.
+```python
+from langchain_core.messages import HumanMessage
+
+message = HumanMessage(
+    content=[
+        {"type": "text", "text": "Describe the weather in this image:"},
+        {
+            "type": "image",
+            "source_type": "url",
+            "url": "https://...",
+        },
+    ],
+)
+response = model.invoke([message])
+```
 
-The gist of passing multimodal inputs to a chat model is to use content blocks that specify a type and corresponding data. For example, to pass an image to a chat model:
+We can also pass the image as in-line data:
 
 ```python
 from langchain_core.messages import HumanMessage
 
 message = HumanMessage(
     content=[
-        {"type": "text", "text": "describe the weather in this image"},
-        {"type": "image_url", "image_url": {"url": image_url}},
+        {"type": "text", "text": "Describe the weather in this image:"},
+        {
+            "type": "image",
+            "source_type": "base64",
+            "data": "<base64 string>",
+            "mime_type": "image/jpeg",
+        },
     ],
 )
 response = model.invoke([message])
 ```
 
-:::caution
-The exact format of the content blocks may vary depending on the model provider. Please refer to the chat model's
-integration documentation for the correct format. Find the integration in the [chat model integration table](/docs/integrations/chat/).
-:::
+To pass a PDF file as in-line data (or URL, as supported by providers such as
+Anthropic), just change `"type"` to `"file"` and `"mime_type"` to `"application/pdf"`.
 
-#### Outputs
+See the [how-to guides](/docs/how_to/#multimodal) for more detail.
 
-Virtually no popular chat models support multimodal outputs at the time of writing (October 2024). 
+Most chat models that support multimodal **image** inputs also accept those values in
+OpenAI's [Chat Completions format](https://platform.openai.com/docs/guides/images?api-mode=chat):
 
-The only exception is OpenAI's chat model ([gpt-4o-audio-preview](/docs/integrations/chat/openai/)), which can generate audio outputs.
+```python
+from langchain_core.messages import HumanMessage
+
+message = HumanMessage(
+    content=[
+        {"type": "text", "text": "Describe the weather in this image:"},
+        {"type": "image_url", "image_url": {"url": image_url}},
+    ],
+)
+response = model.invoke([message])
+```
+
+Otherwise, chat models will typically accept the native, provider-specific content
+block format. See [chat model integrations](/docs/integrations/chat/) for detail
+on specific providers.
+
+
+#### Outputs
 
-Multimodal outputs will appear as part of the [AIMessage](/docs/concepts/messages/#aimessage) response object.
+Some chat models support multimodal outputs, such as images and audio. Multimodal
+outputs will appear as part of the [AIMessage](/docs/concepts/messages/#aimessage)
+response object. See for example:
 
-Please see the [ChatOpenAI](/docs/integrations/chat/openai/) for more information on how to use multimodal outputs.
+- Generating [audio outputs](/docs/integrations/chat/openai/#audio-generation-preview) with OpenAI;
+- Generating [image outputs](/docs/integrations/chat/google_generative_ai/#image-generation) with Google Gemini.
 
 #### Tools
 
 
@@ -50,6 +50,7 @@ See [supported integrations](/docs/integrations/chat/) for details on getting st
 - [How to: force a specific tool call](/docs/how_to/tool_choice)
 - [How to: work with local models](/docs/how_to/local_llms)
 - [How to: init any model in one line](/docs/how_to/chat_models_universal_init/)
+- [How to: pass multimodal data directly to models](/docs/how_to/multimodal_inputs/)
 
 ### Messages
 
@@ -67,6 +68,7 @@ See [supported integrations](/docs/integrations/chat/) for details on getting st
 - [How to: use few shot examples in chat models](/docs/how_to/few_shot_examples_chat/)
 - [How to: partially format prompt templates](/docs/how_to/prompts_partial)
 - [How to: compose prompts together](/docs/how_to/prompts_composition)
+- [How to: use multimodal prompts](/docs/how_to/multimodal_prompts/)
 
 ### Example selectors