- 
LlamaEdgeChatServiceprovides developers an OpenAI API compatible service to chat with LLMs via HTTP requests.
- 
LlamaEdgeChatLocalenables developers to chat with LLMs locally (coming soon).
LlamaEdgeChatService and LlamaEdgeChatLocal run on the infrastructure driven by WasmEdge Runtime, which provides a lightweight and portable WebAssembly container environment for LLM inference tasks.
Chat via API Service
LlamaEdgeChatService works on the llama-api-server. Following the steps in llama-api-server quick-start, you can host your own API service so that you can chat with any models you like on any device you have anywhere as long as the internet is available.
Chat with LLMs in the non-streaming mode
Chat with LLMs in the streaming mode
Connect these docs programmatically to Claude, VSCode, and more via MCP for    real-time answers.