LLMs with the Llama2Chat wrapper to support the Llama-2 chat prompt format. Several LLM implementations in LangChain can be used as interface to Llama-2 chat models. These include ChatHuggingFace, LlamaCpp, GPT4All, …, to mention a few examples.
Llama2Chat is a generic wrapper that implements BaseChatModel and can therefore be used in applications as chat model. Llama2Chat converts a list of Messages into the required chat prompt format and forwards the formatted prompt as str to the wrapped LLM.
prompt_template:
Chat with Llama-2 via HuggingFaceTextGenInference LLM
A HuggingFaceTextGenInference LLM encapsulates access to a text-generation-inference server. In the following example, the inference server serves a meta-llama/Llama-2-13b-chat-hf model. It can be started locally with:
--num_shard value to the number of GPUs available. The HF_API_TOKEN environment variable holds the Hugging Face API token.
HuggingFaceTextGenInference instance that connects to the local inference server and wrap it into Llama2Chat.
model together with prompt_template and conversation memory in an LLMChain.
Chat with Llama-2 via LlamaCPP LLM
For using a Llama-2 chat model with a LlamaCPP LMM, install the llama-cpp-python library using these installation instructions. The following example uses a quantized llama-2-7b-chat.Q4_0.gguf model stored locally at ~/Models/llama-2-7b-chat.Q4_0.gguf.
After creating a LlamaCpp instance, the llm is again wrapped into Llama2Chat
Connect these docs programmatically to Claude, VSCode, and more via MCP for    real-time answers.