litellm and llamafile - APIError: OpenAIException - File Not Found
I wanted to get two of my favourite tools in the LLM world - llmlite and llamafile - to play nicely and ran into an issue that I’ll explain in this blog post. This should be helpful if you’re trying to wire up other LLM servers to llmlite, it’s not specific to llamafile.
Setting up llamafile
In case you want to follow along, I downloaded llamafile and MistralAI 7B weights from TheBloke/Mistral-7B-v0.1-GGUF. I then started the llamafile server like this:
./llamafile-server-0.3 -m models/mistral-7b-instruct-v0.1.Q4_K_M.gguf --nobrowser
Installing litellm
Let’s install litellm, a library that lets you call many different LLMs as if they had the OpenAI API.
poetry add litellm
or
pip install litellm
Calling llamafile from litellm
Right, now we need to figure out how to call llamafile from litellm. There isn’t currently a documentation page for llamafile, but we can follow the instructions for creating a Custom API Server, since llamafile provides an Open AI compatible endpoint.
import os
from litellm import completion
os.environ["OPENAI_API_KEY"] = "i-am-not-used-but-must-be-here"
messages = [{"content": "Write a limerick about ClickHouse", "role": "user"}]
response = completion(
model="command-nightly",
messages=messages,
api_base="http://localhost:8080/",
custom_llm_provider="openai"
)
I ran this code and got the following error:
File ~/Library/Caches/pypoetry/virtualenvs/llamafile-playground-PMlWj0HV-py3.11/lib/python3.11/site-packages/litellm/utils.py:4192, in exception_type(model, original_exception, custom_llm_provider, completion_kwargs)
4190 else:
4191 exception_mapping_worked = True
-> 4192 raise APIError(
4193 status_code=original_exception.status_code,
4194 message=f"OpenAIException - {original_exception.message}",
4195 llm_provider="openai",
4196 model=model,
4197 request=original_exception.request
4198 )
4199 else:
4200 # if no status code then it is an APIConnectionError: https://github.com/openai/openai-python#handling-errors
4201 raise APIConnectionError(
4202 __cause__=original_exception.__cause__,
4203 llm_provider=custom_llm_provider,
4204 model=model,
4205 request=original_exception.request
4206 )
APIError: OpenAIException - File Not Found
I couldn’t make much sense of the stack trace - I’m not sure why it’s mentioning trying to find a file to start with, so I turned on debug mode so that I could see the HTTP requests that were being made.
import litellm
litellm.set_verbose = True
If we re-run the completion function above, we’ll see something like the following output:
POST Request Sent from LiteLLM:
curl -X POST \
http://localhost:8080 \
-d '{'model': 'command-nightly', 'messages': [{'content': 'Write a limerick about ClickHouse', 'role': 'user'}]}'
RAW RESPONSE:
File Not Found
The debug output doesn’t seem quite right to me as I think it’s actually appending /chat/completion
to the base URI, which would mean the request was made to http://localhost:8080/chat/completion
.
I had a look in the llamafile logs to see if it had registered a request:
{"timestamp":1702535457,"level":"INFO","function":"log_server_request","line":2593,"message":"request","remote_addr":"127.0.0.1","remote_port":50193,"status":404,"method":"POST","path":"/chat/completions","params":{}}
The 'File Not Found' message makes more sense now since it was saying that it was getting a 404 when trying to call llamafile.
The mistake I’d made is not including the v1
suffix in the api_base
property.
Let’s fix that:
response = completion(
model="command-nightly",
messages=messages,
api_base="http://localhost:8080/v1",
custom_llm_provider="openai"
)
If we run it this time, we’ll see the following debug output:
POST Request Sent from LiteLLM:
curl -X POST \
http://localhost:8080/v1 \
-d '{'model': 'command-nightly', 'messages': [{'content': 'Write a limerick about ClickHouse', 'role': 'user'}]}'
RAW RESPONSE:
{"id":"chatcmpl-b5Q6ctuTXM9xgHJPyH7q8oqw5OL1FScH","choices":[{"finish_reason":"stop","index":0,"message":{"content":"There once was a database named ClickHouse\nIt could handle all sorts of data, no doubt\nWith its speed and its might\nIt could handle all queries in sight\nAnd its users were never left in a drought\n","role":"assistant","function_call":null,"tool_calls":null}}],"created":1702536234,"model":"gpt-3.5-turbo-0613","object":"chat.completion","system_fingerprint":null,"usage":{"completion_tokens":53,"prompt_tokens":37,"total_tokens":90}}
And let’s print out the limerick:
print(response.choices[0].message.content)
There once was a database named ClickHouse
It could handle all sorts of data, no doubt
With its speed and its might
It could handle all queries in sight
And its users were never left in a drought
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.