Llama-2 LLMPythonStreamlitReplicate

Large Language Model (LLM) Chatbot

By Lim Rong Yi
Picture of the author
Published on
Duration
1 Month
Role
Developer
Streamlit Chatbot
Streamlit Chatbot
Pandas Auto EDA
Pandas Auto EDA

PharmaWave GPT (fictional company) is a web application that allows users to chat with a Large Language Model (LLM) chatbot and was built using Meta AI's Llama-2 LLM and deployed using Streamlit and Python.

The goal was to simulate an in-house chatbot that could be used to answer questions about the company and its products, and also to allow data analyst/scientists to upload the company's private data to do EDA, saving the hassle of having to recode the same EDA steps over and over again.

Link to the web application: PharmaWave GPT


The challenge

Other than the usual challenges of building a web application, the main challenge was to deploy the LLM model in a way that it could be used by the imaginary "staff".

Training a model from scratch would be too time-consuming and costly, so the team decided to use an open-sourced model, and train from there. However, storage was also an issue as most of the open-sourced models were too large to be deployed on Streamlit or GitHub. For context, the Llama-2 model was about 800GB in size.

Proposed Solution
Proposed Solution

Proposed Solution

Thus the team turned to Replicate, a platform that allows users to deploy their models in a few clicks. Using an API key approach, the team was able to deploy the model on Replicate, keeping the code clean and storage costs low.

Proposed Framework
Proposed Framework

As the organization is billed per query, the team will suggest to give role-based access to the chatbot, so that each department is billed accordingly.


Model Deployment

To prepare the model for deployment, the team used the following steps:

Training LLMs can be very resource intensive. It is recommended to requisite a cloud VM instance with GPU capabilities, such as LambdaLabs or PaperSpace.

  1. Clone the model from Github
  2. Requisite a GPU Cloud Instance on PaperSpace
Tunneling into the Instance
Tunneling into the Instance
  1. Install Cog & Docker
sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
sudo chmod +x /usr/local/bin/cog
cd path/to/your/model
cog init

Cog is a command-line tool for deploying machine learning models to Replicate. It is designed to be used with any machine learning framework, including PyTorch, TensorFlow, and scikit-learn.

Docker is a set of platform as a service (PaaS) products that use OS-level virtualization to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries and configuration files; they can communicate with each other through well-defined channels.

  1. Using the ./download.sh script, we will download the model weights.
  2. Build a docker image & Tensorize the weights
cog run /script/download-weights
  1. Once the docker image is built, the model is pushed to Replicate.
Pushing to Replicate
Pushing to Replicate
  1. Next, test a prediction locally.
Testing the model
Testing the model
  1. Assess if the prompts are working as intended, if not a training dataset will be required to fine-tune the model.
Model Fine-tuning
  1. Using this dataset as a simulation that this is "sensitive information".

  2. We will train the model using Replicate's API and library:

import replicate

training = replicate.trainings.create(
  version="base-model",
  input={
    "train_data": "training-data",
  },
  destination="the repo"
)

Training time!
Training time!
  1. Phew! The model is now trained and ready to be deployed.

Video Demo

Click on the image below to watch the video demo of the web application.


Closing Thoughts

This was the first time I am working with a LLM model, and it was a great experience. I was able to learn more about the model and how it can be used in a real-world setting. I also learnt how to deploy a model on Replicate, and how to use Streamlit to build a web application.