Fetch Available Models Using API for Agents that Support It #98

a-priestley · 2024-08-17T16:49:32Z

a-priestley
Aug 17, 2024

For each adapter in source (except ollama, but more on that later), we have hard-coded model lists in schema.model.choices, eg:

    choices = {
      "gpt-4o",
      "gpt-4o-mini",
      "gpt-4-turbo-preview",
      "gpt-4",
      "gpt-3.5-turbo",
    },

OpenAI in this example can instead be queried for models:

curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"

I think it would be really cool to use this endpoint for populating the choices table, ideally with a simple local cache preventing over-fetching. To my knowledge Anthropic doesn't have this unfortunately.

Coming back to ollama, it looks like something similar to my suggestion is being done:

local function get_ollama_choices()
  local handle = io.popen("ollama list")
  local result = {}

  if handle then
    for line in handle:lines() do
      local first_word = line:match("%S+")
      if first_word ~= nil and first_word ~= "NAME" then
        table.insert(result, first_word) -- Insert the full name
        if first_word:find(":latest$") then -- Check if it ends with :latest
          local base_name = first_word:match("([^:]+)") -- Extract the base name
          if base_name then
            table.insert(result, base_name) -- Insert the base name
          end
        end
      end
    end

    handle:close()
  end
  return result
end

The issue here is if you are not running ollama locally, (and even if you are, you might be running in a container). Running ollama commands directly on the machine running neovim will not work in these cases. Luckily, ollama has an endpoint for this!: /v1/models. Example:

curl localhost:11434/v1/models | jq .

Yields:

{
  "object": "list",
  "data": [
    {
      "id": "llama3:latest",
      "object": "model",
      "created": 1714912209,
      "owned_by": "library"
    },
    {
      "id": "deepseek-coder:6.7b",
      "object": "model",
      "created": 1712418297,
      "owned_by": "library"
    },
    {
      "id": "codellama:13b-instruct-q5_1",
      "object": "model",
      "created": 1712417651,
      "owned_by": "library"
    },
    {
      "id": "dolphin-mixtral:8x7b-v2.5-q2_K",
      "object": "model",
      "created": 1712416335,
      "owned_by": "library"
    },
    {
      "id": "dolphin-mixtral:latest",
      "object": "model",
      "created": 1712415226,
      "owned_by": "library"
    },
    {
      "id": "nomic-embed-text:latest",
      "object": "model",
      "created": 1710342487,
      "owned_by": "library"
    },
    {
      "id": "mistral:latest",
      "object": "model",
      "created": 1710342455,
      "owned_by": "library"
    },
    {
      "id": "mistral:7b",
      "object": "model",
      "created": 1710011495,
      "owned_by": "library"
    },
    {
      "id": "codellama:13b-code",
      "object": "model",
      "created": 1710007293,
      "owned_by": "library"
    },
    {
      "id": "llama2:13b",
      "object": "model",
      "created": 1710005951,
      "owned_by": "library"
    },
    {
      "id": "vicuna:33b",
      "object": "model",
      "created": 1710005162,
      "owned_by": "library"
    },
    {
      "id": "llama2:70b",
      "object": "model",
      "created": 1710004324,
      "owned_by": "library"
    },
    {
      "id": "llama2-uncensored:70b",
      "object": "model",
      "created": 1710002937,
      "owned_by": "library"
    },
    {
      "id": "llama2:latest",
      "object": "model",
      "created": 1710000818,
      "owned_by": "library"
    }
  ]
}

Of course the response comes back as json, so it would need to be parsed differently from the code above.

UPDATE: here is a quick crack at reimplementing get_ollama_choices() using curl via plenary:

local Job = require("plenary.job")

local function get_ollama_choices()
  local json
  local result = {}

  Job:new({
    command = "curl",
    args = { "-s", "localhost:11434/v1/models" },
    on_exit = function(j, return_val)
      if return_val ~= 0 then
        vim.notify(string.format("Error fetching models: %s", return_val), vim.log.levels.ERROR)
        return
      end

      json = table.concat(j:result(), "\n")
    end,
  }):sync()

  if not json then
    return result
  end

  local parsed = vim.fn.json_decode(json)
  if not parsed.data then
    return result
  end

  for _, model in ipairs(parsed.data) do
    local id = model.id
    if id then
      table.insert(result, id) -- Insert the full name
      if id:find(":latest$") then -- Check if it ends with :latest
        local base_name = id:match("([^:]+)") -- Extract the base name
        if base_name then
          table.insert(result, base_name) -- Insert the base name
        end
      end
    end
  end

  return result
end

If acceptable, I'd love to make a contribution to this wonderful project.

UPDATE:

Implemented for OpenAI as well -- check my branch!

olimorris · 2024-08-17T21:25:45Z

olimorris
Aug 17, 2024
Maintainer

Thanks for your kinds words really appreciated. And the plugin continues to improve because the community comes up with awesome ideas.

I've given some thought about adding this functionality in recent weeks. I'm mostly against it, because I think it adds bloat and steps in order to add a new adapter to the plugin. I've just recently updated my ADAPTERS.md guide and would love to be able to reduce the amount of work it takes to get to that stage.

Also...Anthropic, Gemini, OpenAI...their models change so infrequently, it's easier just to make a PR to their respective adapters and add them to the choice table. There are some users who may like to show the adapter's settings in the chat buffer and calling an API every time you open a chat buffer is expensive. We'd need to implement some level of caching. It would end up being a lot of new lines of code for minimal reward (crikey I sound like a PM...).

Ollama, however, is very different and I think this feature should be implemented. I had no idea they had a local endpoint for grabbing all the models so great spot and suggestion. I've added this support in e25932e. The default Ollama model will now be the most recent one.

4 replies

a-priestley Aug 17, 2024
Author

it adds bloat and steps in order to add a new adapter to the plugin.

Maybe we have a slight difference of philosophy. I saw the change as a way of automating away the need to manually update those choice tables ever again. After all, if it is a non-issue to do so, why are there so many models missing from the OpenAI table? There is also no reason for some adapters to use hardcoded choices (as must be the case for anthropic) and others to use fetch.

There are some users who may like to show the adapter's settings in the chat buffer and calling an API every time you open a chat buffer is expensive.

Correct me if I'm wrong, but the call is only happening the first time we load the chat for a given adapter and the entries are persisted until the plugin reloads. As far as cost goes, this was news to me. I would have thought that the simple act of pulling down a small json payload would have been free, as no models have actually been called. But again I could be wrong.

Regardless, I agree that ollama is the most important adapter to have this, and I appreciate the adoption!

olimorris Aug 17, 2024
Maintainer

With the changes I made in d736c83, you could add this in to your own configuration by extending the adapters as per the README. You could pretty much copy my Ollama implementation. Oh and I meant expensive in terms of latency

a-priestley Aug 18, 2024
Author

With the changes I made in d736c83, you could add this in to your own configuration by extending the adapters as per the README. You could pretty much copy my Ollama implementation. Oh and I meant expensive in terms of latency

Thanks for clarifying! I've discarded those commits, and I'll experiment as you suggested for my own personal use case.

There is another improvement to the ollama adapter I can suggest actually -- going back to the topic of not hosting locally, variables for the host URL can be provided (as in not hard coding http://localhost:11434) But I can open up a dedicated discussion if it would be helpful. I don't host ollama remotely myself, and I'm uncertain what security measures come out of the box for those that do.

olimorris Aug 19, 2024
Maintainer

Great suggestion. I've added that to the plugin this morning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetch Available Models Using API for Agents that Support It #98

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Fetch Available Models Using API for Agents that Support It #98

a-priestley Aug 17, 2024

Replies: 1 comment · 4 replies

olimorris Aug 17, 2024 Maintainer

a-priestley Aug 17, 2024 Author

olimorris Aug 17, 2024 Maintainer

a-priestley Aug 18, 2024 Author

olimorris Aug 19, 2024 Maintainer

a-priestley
Aug 17, 2024

Replies: 1 comment 4 replies

olimorris
Aug 17, 2024
Maintainer

a-priestley Aug 17, 2024
Author

olimorris Aug 17, 2024
Maintainer

a-priestley Aug 18, 2024
Author

olimorris Aug 19, 2024
Maintainer