Code and datasets of TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents, (AAAI 2025).
The code for the multi-modal encoder is available upon request: geonlee0325@kaist.ac.kr.
We release seven time series datasets from three different domains, which can be found here.
- π€οΈ Weather: weather_ny (New York), weather_sf (San Francisco), and weather_hs (Houston)
- π° Finance: finance_sp500 (S&P 500) and finance_nikkei (Nikkei 225)
- π₯ Healthcare: healthcare_mortality (Mortality Rate) and healthcare_positive (Test-Positive Rate)
You can load the dataset as follows (e.g., weather datasets):
import pickle as pkl
with open('indices.pkl', 'rb') as f:
indices = pkl.load(f)
with open(f'time_series_{city}.pkl', 'rb') as f:
data = pkl.load(f)
You can load the labels of time series events as follows (e.g., weather datasets):
with open(f'rain_{city}.pkl', 'rb') as f:
labels = pkl.load(f)
# weather
0: not rained / 1: rained
# finance
0: decreased / 1: neutral / 2: increased
# healthcare
0: did not exceed the average / 1: exceeded the average
The total number of events is given by len(indices)
. The correspondence between the time series and labels is as follows:
for _i in range(len(indices)):
i = indices[_i]
# Time series
time_series = data[i:i+window_size]
# Label
label = labels[_i]
Each dataset directory contains:
gpt_summary
: Textual summaries of time series generated by GPT-4.gpt_predict_time
: Predictions generated by GPT-4 based on time series.gpt_predict_text
: Predictions generated by GPT-4 based on textual summaries (TimeCP).gpt_predict_in-context
: Predictions generated by TimeCAP.
We contextualize time series using LLM. The prompt of each dataset is as follows:
- π€οΈ Weather
# System Prompt
Your job is to act as a professional weather analyst. You will write a high-quality report that is informative and helps in understanding the current weather situation.
# User Prompt
Your task is to analyze key weather indicators in {city_name} over the last {window_size} hours. Review the time-series data provided for the last {window_size} hours. Each time-series consists of hourly values separated by a \'|\' token for the following indicators:
- Temperature (Kelvin): {temperature}
- Humidity (%): {humidity}
- Air Pressure (hPa): {pressure}
- Wind Speed (m/s): {wind_speed}
- Wind Direction (degrees): {wind_direction}
Based on this time-series data, write a concise report that provides insights crucial for understanding the current weather situation. Your report should be limited to five sentences, yet comprehensive, highlighting key trends and considering their potential impact on the weather in {city_name}. Do not write numerical values while writing the report.
- π° Finance
# System Prompt
Your job is to act as a professional finance analyst. You will write a high-quality report that is informative and helps in understanding the current financial situation.
# User Prompt
Your task is to analyze key financial indicators over the last {window_size} market days. Review the time-series data provided for the last {window_size} market days. Each time-series consists of daily values separated by a \'|\' token for the following indicators:
- S&P 500: {s_p_500}
- VIX (Volatility Index): {vix}
- Nikkei 225: {nikkei_225}
- FTSE 100: {ftse_100}
- Gold Futures: {gold_futures}
- Crude Oil Futures: {crude_oil_futures}
- Exchange rate for EUR/USD: {eur_usd}
- Exchange rate for USD/JYP: {usd_jpy}
- Exchange rate for USD/CNY: {usd_cny}
Based on this time-series data, write a concise report that provides insights crucial for understanding the current financial situation. Your report should be limited to five sentences, yet comprehensive, highlighting key trends and considering their potential impact on the market. Do not write numerical values while writing the report.
- π₯ Healthcare
# System Prompt
Your job is to act as a professional healthcare analyst. You will write a high-quality report that is informative and helps understand the current healthcare situation.
# User Prompt
Your task is to analyze the respiratory specimens testing positive for influenza over the last {window_size} weeks. The average ratio of positive speciemens is 6.26%. Review the time-series data provided for the last {window_size} weeks. Each time-series consists of weekly values separated by a \'|\' token for the following indicators:
- Number of specimens tested: {total_specimens}
- Number of positive specimens for Influenza A: {total_a}
- Number of positive specimens for Influenza B: {total_b}
- Ratio of positive specimens (%): {pos_rate}
- Ratio of positive specimens for Influenza A (%): {a_rate}
- Ratio of positive specimens for Influenza B (%): {b_rate}
Based on this time-series data, write a concise report that provides insights crucial for understanding the current healthcare situation. Your report should be limited to five sentences, yet comprehensive, highlighting key trends and considering their potential impact on the healthcare system. Do not write redundant information.
We predict time series events using time series as inputs. The prompt of each dataset is as follows:
- π€οΈ Weather
# System Prompt
Your job is to act as a professional weather forecaster. You will be given a time-series data of the weather from the past 24 hours. Based on this information, your task is to predict whether it will rain in the next 24 hours.
# User Prompt
Your task is to predict whether it will rain or not in {city_name} in the next {window_size} hours. Review the time-series data provided for the last {window_size} hours. Each time-series consists of hourly values separated by a \'|\' token for the following indicators:
- Temperature (Kelvin): {temperature}
- Humidity (%): {humidity}
- Air Pressure (hPa): {pressure}
- Wind Speed (m/s): {wind_speed}
- Wind Direction (degrees): {wind_direction}
Based on this information, respond with either \'rain\' or \'not rain\'. Do not provide any other details.
- π° Finance
# System Prompt
Your job is to act as a professional financial forecaster. You will be given a time-series data from the past 20 market days. Based on this information, your task is to predict whether the {indicator_name} price will decrease by more than 1%, increase by more than 1%, or change minimally in the next market day.
# User Prompt
Your task is to predict whether the {indicator_name} price will: (1) Decrease: decrease by more than 1% (2) Increase: increase by more than 1% (3) Neutral: change minimally, between -1% to 1%\nin the next market day. Review the time-series data provided for the last {window_size} market days. Each time-series consists of daily values separated by a \'|\' token for the following indicators:
- S&P 500: {s_p_500}
- VIX (Volatility Index): {vix}
- Nikkei 225: {nikkei_225}
- FTSE 100: {ftse_100}
- Gold Futures: {gold_futures}
- Crude Oil Futures: {crude_oil_futures}
- Exchange rate for EUR/USD: {eur_usd}
- Exchange rate for USD/JYP: {usd_jpy}
- Exchange rate for USD/CNY: {usd_cny}
Based on this information, predict whether the {indicator2name[indicator]} price will decrease by more than 1%, increase by more than 1%, or otherwise, in the next market day. Respond with either \'decrease\', \'increase\', or \'neutral\'. Do not provide any other details.
- π₯ Healthcare
# System Prompt
Your job is to act as a professional healthcare forecaster. You will be given a time-series data from the past 20 weeks. Based on this information, your task is to predict whether the ratio of mortality from Influenza or Pneumonia to the total number of death will exceed its average in the comming week.
# User Prompt
Your task is to predict whether the percentage of respiratory specimens testing positive for influenza will: (1) Exceed its average of 6.26% (2) Not exceed its average of 6.26% in the coming week. Review the time-series data provided for the last {window_size} weeks. Each time-series consists of weekly values separated by a \'|\' token for the following indicators:"
- Number of specimens tested: {total_specimens}
- Number of positive specimens for Influenza A: {total_a}
- Number of positive specimens for Influenza B: {total_b}
- Ratio of positive specimens (%): {pos_rate}
- Ratio of positive specimens for Influenza A (%): {a_rate}
- Ratio of positive specimens for Influenza B (%): {b_rate}
Based on this time-series data, predict whether the percentage of respiratory specimens testing positive for influenza will exceed its average of 6.26% or not in the comming week. Respond with either \'exceed\' or \'not exceed\'. Do not provide any other details.
We predict time series events using text (generated by LLMs above) as inputs. The prompt of each dataset is as follows:
- π€οΈ Weather
# System Prompt
Your job is to act as a professional weather forecaster. You will be given a summary of the weather from the past 24 hours. Based on this information, your task is to predict whether it will rain in the next 24 hours.
# User Prompt
Your task is to predict whether it will rain or not in {city_name} in the next {window_size} hours. The weather of the past 24 hours is summarized as follows:
{TEXT}
Based on this information, respond with either \'rain\' or \'not rain\'. Do not provide any other details.
- π° Finance
# System Prompt
Your job is to act as a professional financial forecaster. You will be given a financial summary of the past 20 market days. Based on this information, your task is to predict whether the {indicator_name} price will decrease by more than 1%, increase by more than 1%, or change minimally in the next market day.
# User Prompt
Your task is to predict whether the {indicator_name} price will: (1) Decrease: decrease by more than 1% (2) Increase: increase by more than 1% (3) Neutral: change minimally, between -1% to 1%\nin the next market day. The financial situation of the last {window_size} market days is summarized as follows:
{TEXT}
Based on this information, predict whether the {indicator_name} price will decrease by more than 1%, increase by more than 1%, or otherwise (neutral), in the next market day. Respond with either \'decrease\', \'increase\', or \'neutral\'. Do not provide any other details.
- π₯ Healthcare
# System Prompt
Your job is to act as a professional healthcare forecaster. You will be given a healthcare summary of the past 20 weeks. Based on this information, your task is to predict whether the percentage of respiratory specimens testing positive for influenza will exceed the average threshold in the comming week.
# User Prompt
Your task is to predict whether the percentage of respiratory specimens testing positive for influenza will: (1) Exceed its average of 6.26% (2) Not exceed its average of 6.26% in the coming week. The healthcare situation of the last {window_size} weeks is summarized as follows:
{TEXT}
Analyze this summary and predict whether the percentage of respiratory specimens testing positive for influenza will exceed the average of 6.26% or not. Respond with either \'exceed\' or \'not exceed\'. Do not provide any other details.
We predict time series events using text (generated by LLMs above) with in-context examples as inputs. The prompt of each dataset is as follows:
- π€οΈ Weather
# System Prompt
Your job is to act as a professional weather forecaster. You will be given a summary of the weather from the past 24 hours. Based on this information, your task is to predict whether it will rain in the next 24 hours.
# User Prompt
Your task is to predict whether it will rain or not in {city_full_name[city]} in the next {window_size} hours.
First, review the following {k} examples of weather summaries and outcomes so that you can refer to when making predictions.
{In-context example 1: Text & Output}
...
{In-context example k: Text & Output}
The weather of the last 24 hours is summarized as follows:
{TEXT}
Based on the understanding of the provided examples, predict the outcome of the current weather summary. Respond your prediction with either 'rain' or 'not rain'. Response should not include other terms.
- π° Finance
# System Prompt
Your job is to act as a professional financial forecaster. You will be given a summary of the financial situation of the past 20 market days. Based on this information, your task is to predict whether the {indicator_name} price will decrease by more than 1%, increase by more than 1%, or change minimally in the next market day.
# User Prompt
Your task is to predict whether the {indicator_name} price will: (1) Decrease: decrease by more than 1% (2) Increase: increase by more than 1% (3) Neutral: change minimally, between -1% to 1%\nin the next market day.
First, review the following {k} examples of financial summaries and {indicator2name[indicator]} outcomes so that you can refer to when making predictions.
{In-context example 1: Text & Output}
...
{In-context example k: Text & Output}
The financial situation of the last {window_size} market days is summarized as follows:
{TEXT}
Refer to the provided examples and predict the outcome of the current financial summary. Respond your prediction with either 'decrease', 'increase' or 'neutral'. Response should not include other terms.
- π₯ Healthcare
# System Prompt
Your job is to act as a professional healthcare forecaster. You will be given a healthcare summary of the past 20 weeks. Based on this information, your task is to predict whether the percentage of respiratory specimens testing positive for influenza will exceed the average threshold in the comming week.
# User Prompt
Your task is to predict whether the percentage of respiratory specimens testing positive for influenza will: (1) Exceed its average of 6.26% (2) Not exceed its average of 6.26% in the coming week.
First, review the following {k} examples of healthcare summaries and their outcomes so that you can refer to when making predictions.
{In-context example 1: Text & Output}
...
{In-context example k: Text & Output}
The healthcare situation of the last {window_size} weeks is summarized as follows:
{TEXT}
Refer to the provided examples and predict the outcome of the current healthcare summary. Respond with either \'exceed\' or \'not exceed\'. Response should not include other terms.