template-version | append_modified_update | modified |
---|---|---|
0.1 |
2025-01-07 10:59:50 +0100 |
course https://learn.deeplearning.ai/courses/ai-agents-in-langgraph (via farago@gmx.de)
https://learn.deeplearning.ai/courses/ai-agents-in-langgraph/lesson/1/introduction
- function calling LLMs have made tool use much more stable/predictable
- specific tools like search have been adapted for agentic use
- answers (backed by links), not links that need to be followed
- fixed format in answers
- Ng is a fan of LangChain, LangSmith and Tavily
- Agentic
- iterative
- multiple hats within the workflow
- just as a person, who also does not write that well if he needs to write a paper in one go, without using backspace
- Key design patterns of agentic workflows
- planning: thinking through the steps, e.g. write outline, a draft, revise it, do search
- tool use: which tools are available, and how to use them.
- reflection: iteratively improving results, with (possibly multiple) LLMs critiquing, making suggestions,
- multi-agent communication: each agent is a uniquely prompted LLM, playing a unique role.
- memory: tracking progress and results.
- Many of these capabilities are implemented outside the LLM, by the framework they operate in, e.g. LangChain
- Memory in multiple forms
- Function calling
- Tool execution
- LangGraph: supports cyclic graphs, which are necessary for many successful patterns
https://learn.deeplearning.ai/courses/ai-agents-in-langgraph/lesson/2/build-an-agent-from-scratch
- useful to notice what jobs fall to the LLM and what jobs fall to the code around the LLM, the runtime
- code example Lesson_1_Student.py, based on https://til.simonwillison.net/llms/python-react-pattern
Break this down into LangChain components:
where `agent_scratchpad` contains all the actions and observations. More prompts, contributed by users, in the prompt hub. #### 3.2 LangChain Tools LangChain community package contains hundreds of other tools. #### 3.3 LangGraphLangGraph
- lets you describe and orchestrate control flow, such as loops (cyclic graphs). See [[LangGraph_exemplaryWorkflows.png]] above: those workflows are all cyclic graphs.
- is an extension of LangChain that supports graphs
- single and multi-agent flows are described and represented as graphs
- allows for extremely controlled "flows"
LangGraph core concepts:
- nodes: agents or functions
- edges: connect nodes
- conditional edges: decisions
Comes with built-in persistence:
- for having multiple conversations at the same time
- remembering previous iterations, actions, ...
- enables cool human-in-the-loop features
Agent state:
Annotation with
operator.add
: when the state is updated with new messages, the new messages do not override the old ones, but are added (otherwise, when a new state is pushed to that variable, the old value is overridden.)
? Why can agent_outcome be None
?
Our ReAct example (from Lesson_1_Student.py
):
Node "llm" calls OpenAI LLM, node "action" executes the action the LLM chose, if it chose one.
Code example: Lesson_2_Student.py
Output of Image(abot.graph.get_graph().draw_png())
:
Using agentic search tool reduces hallucination and HCI friction
Example of a basic search tool implementation:
- understand the question and divide it to sub-queries. This is important for handling complex queries.
- for each sub-query, find the best source, choosing from multiple integrations.
- e.g. for the query "How is the weather in SF", the weather API should be used
- extract the information relevant to the sub-query. Basic implementation:
- chunk the source
- do a vector search
- retrieve the top-k chunks
- search tool scores and filters the results
- return topk docs ? Are multiple sources used and scoring & filtering is performed for each of the sources? Is top-k docs top-k sources?
Code example: Lesson_3_Student.py
Tavily yields:
{
"location": {
"name": "San Francisco",
"region": "California",
"country": "United States of America",
"lat": 37.775,
"lon": -122.4183,
"tz_id": "America/Los_Angeles",
"localtime_epoch": 1736026501,
"localtime": "2025-01-04 13:35"
},
"current": {
"last_updated_epoch": 1736026200,
"last_updated": "2025-01-04 13:30",
"temp_c": 12.8,
"temp_f": 55.0,
"is_day": 1,
"condition": {
"text": "Partly cloudy",
"icon": "//cdn.weatherapi.com/weather/64x64/day/116.png",
"code": 1003
},
"wind_mph": 4.5,
"wind_kph": 7.2,
"wind_degree": 358,
"wind_dir": "N",
"pressure_mb": 1026.0,
"pressure_in": 30.29,
"precip_mm": 0.0,
"precip_in": 0.0,
"humidity": 77,
"cloud": 75,
"feelslike_c": 12.4,
"feelslike_f": 54.4,
"windchill_c": 10.2,
"windchill_f": 50.4,
"heatindex_c": 11.0,
"heatindex_f": 51.8,
"dewpoint_c": 7.6,
"dewpoint_f": 45.7,
"vis_km": 16.0,
"vis_miles": 9.0,
"uv": 1.8,
"gust_mph": 5.9,
"gust_kph": 9.5
}
}
this is exactly what an agent wants to see.
In contrast, this is what a human wants to see:
Agents often work on longer running tasks. Two important concepts for this:
- Persistence: To keep around the state of an agent at a particular point in time
- To resume that state in future interactions
- Streaming: Emit a list of signals of what's going on at that exact moment, for interpretability.
These are really powerful for building production applications:
- so that your agents are able to have multiple conversations with multiple users at the same time
- have a concept of memory so that your agents can resume those conversations.
Code example: Lesson_4_Student.py
- based upon Lesson_2_Student.py
- adds a checkpointer
- checkpoints the LangGraph state (we only have
messages: Annotated[list[AnyMessage], operator.add]
) after and between every node - we will use
SqliteSaver
- a really simple checkpointer using SQLite, which is built in under the hood
- use it with in memory
- other checkpointers in LangGraph: Redis, Postgres
- pass in checkpointer to
graph.complie()
- checkpoints the LangGraph state (we only have
- adds streaming
- of individual messages (i.e. events)
AIMessage
s- observation messages showing the result of taking that action (contained in corresponding
ToolMessage
s)
- using the concept of a thread config
- to keep track of different threads inside the persistent checkpointer, so we can have multiple conversations at the same time. This is needed in production with many users.
- use dict
{"configurable": {"thread_id": "uniqueString"}}
- call graph not with
invoke
butstream
stream
returns a stream of events, representing updates of the LangGraph state (we only havemessages: Annotated[list[AnyMessage], operator.add]
) over timeHumanMessage
is already in the LangGraph state when callingstream
, so those messages will not show up in the stream of events
- of tokens themselves (i.e. for all LLM calls)
- all output tokens
- use
astream_events
method, available on all LangChain and LangGraph objects - is an asynchronous method, so we need an async checkpointer, so use
AsyncSqliteSaver
instead ofSqliteSaver
astream_events
also returns a stream of other kind of events, which represent updates from the underlying stream- find
on_chat_model_stream
events to get events that correspond to new tokens
- streaming
- of individual messages (i.e. events)
Persistence is also important for human in the loop interactions.
You often want to keep the human in the loop, which is easy to do in LangGraph.
New LangGraph version has a couple of key additions:
- Additional state information is stored to memory and displayed when using
get_state()
orget_state_history()
. - State is additionally stored every state transition while previously it was stored at an interrupt or at the end. These change the command output slightly, but are a useful addition to the information available.
Code example: Lesson_5_Student.py
- based upon Lesson_4_Student.py
- for human in the loop interactions, we want to replace existing messages
- do this with a custom reduce messages function
reduce_messages
that looks for messages with the same ID, and if it sees that you are inserting one with the same ID as one that already exists, it replaces that. Otherwise, appending as before. - If a message (this is the case for
HumanMessage
s) does not yet contain an id, add one
- do this with a custom reduce messages function
- add interrupt before we call node "action", for manual approval before we run any tool, to make sure tools are executed correctly. Alternatively, you could only interrupt when a certain tool is called: TODO
- to continue after interrupt, we can simply call
abot.graph.stream(None, thread)
; since "messages" is annotated in a way such thatNone
does not change the messages attribute fromAgentState
, so we are simply continuing
As the graph is being executed, a snapshot of each state is executed in memory:
- the
AgentState
, as defined in our code - the
thread
config (see code), consisting ofthread_id
to identify the threadthread_ts
to identify the snapshots. Use it to access the snapshots:
Commands to access the memory:
graph.get_state({"configurable": {"thread_id": "42"}})
(see code): returns the current stategraph.get_state_history({"configurable": {"thread_id": "42"}})
: returns iterator over allStateSnapshot
sgraph.invoke(None, {"configurable": {"thread_id": "42", "thread_ts": ...}})
orgraph.stream(None, {"configurable": {"thread_id": "42", "thread_ts": ...}})
runs with that snapshot as starting point. This is time travel:
- you can modify that state snapshot
state1
tostate1m
and then callgraph.update_state({"thread_id": "42"}, state1m)
to store it back on top, as new current state. When you then runstream
orinvoke
,state1m
will be used as its starting point:
See State Memory images above:
- We are keeping a running list of all these state. When we modified a state, we created a new state. When we update the graph, the new state is ADDED.
- This allows us to go back and visit previous states. This is called time travel.
Code example Lesson_5_Student.py at # Time Travel:
shows how to time travel by
- store the states in an list
- pick the state
to_replay
before action, i.e. before tool calling - call
graph.stream(None, to_replay.config)
Mix of time travel and modify state, see code example Lesson_5_Student.py at # Go back in time and edit:
shows how to travel back to a previous, but edited state:
- pick
to_replay
again - modify it as in [[Deeplearning course ai-agents-in-langgraph#5.2 Modify state]]
- branch of into
branch_state
withbranch_state = abot.graph.update_state(to_replay.config, to_replay.values)
- execute from
branch_state
viagraph.stream(None, branch_state)
Instead of calling Tavily, we want to mock a response by appending a new message into the state, see code example Lesson_5_Student.py at # Add message to a state at a given time:
shows how to:
- get the
tool_call_id
that is expected from the next tool call, fromto_replay
- create a new message
sate_update
- update the state in the graph via
graph.update_state(to_replay.config, state_update, as_node="action")
.- Since
state_update
is a new message, it is not going to replace an existing message, it is going to append it to the list of messages. - by adding the
as_node="action"
parameter, we are doing the modification as if we were the action node
- Since
- calling
graph.stream(None, branch_and_add)
executes without taking an action anymore, but causes anAIMessage
event.
Overall, we have seen a lot of advanced and complicated human-in-the-loop interaction patterns that give you more control:
- add a break, so humans can approve/deny
- modify current state
- how go back in time
- modify past state
- update state manually to give the result of a tool manually, rather than really calling the tool.
A tiny graph is very helpful for understanding and debugging.
Call graph.invoke(None, states[-3])
to go back in time with states[-3]
as current_state.
Afterwards, the memory contains branches, as you can see via config
's and parent_config
's thread_ts
s.
graph.update_state(thread2,save_state.values)
creates a new entry at the top (latest entry) in memory. The new entry on top has
- the previous top as parent
count
1 because update -3 was added to the previous count value 4, rather than replacing it.
The as_node
argument to graph.update_state
specifies the writer, so that graph traversal knows which starting point (aka "writer") to assume.
Essay Writer is a compact version of an AI researcher:
plan
: generates a plan, once upfront.research_plan
does some research based on that plan, which involves calling Tavily and getting back some documents.generate
follows the plan using the documents fromresearch_plan
to write the essay.- in case we continue, we go to
reflect
, where we generate a critique of the current essay research_critique
is another research step calling Tavily, to get another set of documents we append to the existing documents.
AgentState
has:
task
, the human inputplan
, generated by theplan
node.draft
of the essaycritique
, generated by the last iteration ofreflect
nodecontent
is the list of documents that Tavily retrievedrevision_number
keeps track of the number of revisions we have mademax_revisions
is used for the exit criterion.
See code example Lesson_6_Student.py:
Example memory contents early on (from the MemoryWriter GUI):
Memory contents after UpdateState:
- Loop back to
planner
, because on the start, without any documents,planner
might have needed lots of guesses.
python.langchain.com/docs/get_started/introduction:
- Tavily is part of LangChain-Community
- LangChain OpenAI is a separate partner package
- LangChain itself is more high-level, good entry points
- Templates can easily be deployed with LangServe
- LangServe easily turns your LangChain application into a web server
github.com/langchain-ai/langchain
- cookbooks
- templates
github.com/langchain-ai/langgraph * in-depth documentation (reference docs, tutorials, how-tos)
deeplearning.ai/short-courses:
- Functions, Tools, and Agents with LangChain
- ...
smith.langchain.com/hub
- get inspiration
- see what other expert prompters are doing
Agent flows we did not cover, but you should know about:
* **Multi-Agent Architecture**: when multiple agents work on the same shared state, i.e. passing the state around from one agent to the next * each agent can have their own tools * each agent can have an own loop inside them. * Supervisor Agent is calling sub-agents * Supervisor determines input to the sub-agents * Sub-Agents can have different states inside them, they are a LangGraph each * vaguely similar to multi-agent architecture, but emphasizes that there is one supervisor in charge of routing, coordinating other agents * good when you have a powerful LLM for the supervisor (planning requires a lot of intelligence) * comes from an AlphaCodium paper where they achieve SotA coding performance * graphical like solution * basically a pipeline * few key nodes that actually loop * thus directed flow up to a certain point, then you have iterations * bespoke architecture for a particular coding problem, but the concept of flow engineering extends more broadly: think about the right information flow for your agents to take actions and think. * plan & execute style flow is common paradigm/pattern 1. plan upfront: steps that a sub-agent should do 2. execute that plan: do one plan step (i.e. sub-task) with a sub-agent, which can do iterations to achieve sub-task. Maybe you update plan during this iteration 3. update plan: update state with task results and replan. 4. respond to user or go back to step 2.- based on https://arxiv.org/abs/2310.04406: Language Agent Tree Search Unifies Reasoning, Acting, And Planning in Language Models
- tree search over the state of possible actions
- select node
- generate new candidates
- act, reflect, and score
- backprop (update parents)
- here you can see why persistence is really important: you need to be able to update earlier states and to time travel
LangGraph is really flexible in covering all these architectures and patterns. This is different from other frameworks and crucial to creating agents that actually work.
Course Achievement: https://learn.deeplearning.ai/accomplishments/16b331d0-b58e-4715-9dbe-e03ccf55fb54?usp=sharing