You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Evaluation speed is significantly slower compared to llama-server alone. I've included performance metrics below for comparison.
llama-server with Mikupad:
prompt eval time = 24.59 ms / 1 tokens ( 24.59 ms per token, 40.67 tokens per second)
eval time = 1309.86 ms / 30 tokens ( 43.66 ms per token, 22.90 tokens per second)
total time = 1334.45 ms / 31 tokens
llama-server:
prompt eval time = 88.61 ms / 19 tokens ( 4.66 ms per token, 214.42 tokens per second)
eval time = 523.65 ms / 23 tokens ( 22.77 ms per token, 43.92 tokens per second)
total time = 612.27 ms / 42 tokens
llama-cli
llama_perf_context_print: prompt eval time = 103.19 ms / 12 tokens ( 8.60 ms per token, 116.29 tokens per second)
llama_perf_context_print: eval time = 1685.34 ms / 74 runs ( 22.77 ms per token, 43.91 tokens per second)
It seems Mikupad is causing a considerable slowdown. I'm wondering if this is due to how it handles token probabilities. Is it possible to disable this feature to potentially improve performance?
The text was updated successfully, but these errors were encountered:
Evaluation speed is significantly slower compared to llama-server alone. I've included performance metrics below for comparison.
llama-server with Mikupad:
prompt eval time = 24.59 ms / 1 tokens ( 24.59 ms per token, 40.67 tokens per second)
eval time = 1309.86 ms / 30 tokens ( 43.66 ms per token, 22.90 tokens per second)
total time = 1334.45 ms / 31 tokens
llama-server:
prompt eval time = 88.61 ms / 19 tokens ( 4.66 ms per token, 214.42 tokens per second)
eval time = 523.65 ms / 23 tokens ( 22.77 ms per token, 43.92 tokens per second)
total time = 612.27 ms / 42 tokens
llama-cli
llama_perf_context_print: prompt eval time = 103.19 ms / 12 tokens ( 8.60 ms per token, 116.29 tokens per second)
llama_perf_context_print: eval time = 1685.34 ms / 74 runs ( 22.77 ms per token, 43.91 tokens per second)
It seems Mikupad is causing a considerable slowdown. I'm wondering if this is due to how it handles token probabilities. Is it possible to disable this feature to potentially improve performance?
The text was updated successfully, but these errors were encountered: