Quality Scoring

Quality scoring helps evaluate the reliability of cell type annotations. CASSIA provides automated scoring functionality through the runCASSIA_score_batch function, which analyzes the reasoning and evidence behind each annotation.

Running Quality Scoring

Basic Usage

runCASSIA_score_batch(
    input_file = "my_annotation_full.csv",
    output_file = "my_annotation_scored.csv",
    max_workers = 4,
    model = "anthropic/claude-3.5-sonnet",
    provider = "openrouter"
)
R

Parameter Details

Input/Output Files:
- input_file: Path to the full annotation results (from runCASSIA_batch)
- output_file: Where to save the scored results
Processing Parameters:
- max_workers: Number of parallel scoring threads
- Recommended: Use fewer workers than annotation step to avoid API limits if provider set to anthropic
Model Configuration:
- Recommended model: anthropic/claude-3.5-sonnet
- Recommended provider: openrouter

API Provider Considerations

OpenRouter

Advantages:
- Higher rate limits
- Easy to switch models

Setup:

provider <- "openrouter"
model <- "anthropic/claude-3.5-sonnet"
R

Anthropic Direct

Considerations:
- New users have usage limits
- May need to reduce max_workers
- Better for smaller datasets

Setup:

provider <- "anthropic"
model <- "claude-3-5-sonnet-20241022"
R

Output Format

The scored output file contains:

Original annotation data
Quality scores (0-100)
Confidence metrics
Detailed reasoning for scores

Interpreting Scores

90-100: High confidence, strong evidence
76-89: Good confidence, adequate evidence
<75: Low confidence, need to run through Annotation Boost Agent and Compare Agent

Report Generation

Generate detailed reports from your analysis. This step typically follows after quality scoring.

The score report includes all outputs from CASSIA, including structured outputs, conversation histories, and quality scores.

Batch Reports from Scored Results

runCASSIA_generate_score_report(
  csv_path = "my_annotation_scored.csv",
  output_name = "CASSIA_reports_summary"
)
R

Generates individual reports and an index page from scored_results.csv.