Setting Up CASSIA
First, ensure you have the reticulate package and the devtools package installed.
install.packages("reticulate") install.packages("devtools")
R
Next, you can install the CASSIA package from github:
# Install the CASSIA package library(devtools) devtools::install_github("ElliotXie/CASSIA/CASSIA_R")
R
Setting Up the Python Environment
CASSIA relies on Python for some of its backend processing. When you load the CASSIA package, it attempts to set up the required Python environment automatically. However, if you encounter issues, you can use the setup_cassia_env()
function to create and configure the necessary Python environment automatically.
library(CASSIA) # Automatically set up the Python environment if needed setup_cassia_env()
R
This function will:
- Create a new Conda environment named
cassia_env
if it doesn't already exist. - Install the required Python packages:
openai
,pandas
,numpy
,scikit-learn
,requests
, andanthropic
.
Setting API Keys
To use LLMs like OpenAI's GPT-4, Anthropic's Claude, or models via OpenRouter, you will first need to get your API keys from the provider and then set your API keys using the setLLMApiKey()
function. Normally it will take about 3 minutes to get the API key from the provider.
Note: You must set at least one API key to use CASSIA.
# For OpenAI setLLMApiKey("your_openai_api_key", provider = "openai", persist = TRUE) # For Anthropic setLLMApiKey("your_anthropic_api_key", provider = "anthropic", persist = TRUE) # For OpenRouter setLLMApiKey("your_openrouter_api_key", provider = "openrouter", persist = TRUE)
R
- Replace
"your_api_key"
with your actual API key. - Set
provider
to"openai"
,"anthropic"
, or"openrouter"
depending on your provider. (Openrouter is recommended for accessing free open source models) - Setting
persist = TRUE
saves the key in your.Renviron
file for future sessions.
How to Select Models and Providers
There are three providers to choose from: openai
, anthropic
, and openrouter
. Each provider has its own models and pricing.
Note that the model name must be set exactly as shown below or the model will not be found.
OpenAI
gpt-4o
gpt-4o
is the most balanced model. We use it as the default while benchmarking CASSIA's performance with GPTcelltype.
Anthropic
claude-3-5-sonnet-20241022
claude-3-5-sonnet-20241022
is the most powerful model. In the benchmark, we use it for scoring and annotation boost agent.
OpenRouter
OpenRouter is a platform that offers access to almost all the models supported by major providers. It is recommended to use OpenRouter to access claude-3-5-sonnet too, since it has the highest rate limit. And you can also use it to access a number of open source models such as llama-3.2 and DeepseekV3. These open source models are much cheaper with a slight performance decrease.
anthropic/claude-3.5-sonnet
openai/gpt-4o-2024-11-20
meta-llama/llama-3.2-90b-vision-instruct
deepseek/deepseek-chat-v3-0324
(most recommended model,performance is almost the same as gpt4o)deepseek/deepseek-chat-v3-0324:free
(free version of DeepseekV3,slightly slower and less stable)
DeepseekV3 is the most recommended model. It is a free and open source model that is almost as good as gpt4o.