Model

The choice of model determines representational capacity, adapter size, memory footprint, and convergence behavior. It directly influences achievable accuracy, training stability, and hardware requirements. Selecting an appropriate model for fine-tuning requires balancing performance, computational cost, and hardware constraints. Efficiency is maximized when the model architecture aligns with the specific requirements of the target task.

Objective Definition

Clarity on the purpose and intended output determines the necessary architecture. Table below summarizes the most common use cases and their corresponding architectures.

Purpose	Architecture	Use Case	Example	Notes
Chat	Decoder	Conversational Agent, Tool Calling, Thinking	Llama-3.1-8B-Instruct	Use instruction-tuned (Instruct) models that are optimized to follow prompts, maintain dialogue context, and produce well-structured, aligned responses suitable for interactive and agentic workflows.
Text Generation	Decoder	Code Completion, Sentence Completion (Used in email clients or writing tools)	Llama-3.1-8B	General reasoning, logic tasks, and standard coding assistance.
Similarity	Bi-Encoder	Similarity Search, RAG embeddings	bge-m3	Produces fixed-size vector embeddings optimized for cosine or dot-product similarity. Not suitable for text generation or conversational tasks. Designed for high-throughput, low-latency embedding.
Reranking	Cross-Encoder	Search result reranking, retrieval refinement, relevance scoring, RAG ranking	bge-reranker-v2-m3	Scores query & document pairs jointly, producing highly accurate relevance ranking. More compute-intensive than embedding models, typically applied to a small candidate set after initial retrieval.
Multi-label Classification	Encoder	Tagging, topic assignment, content moderation, intent detection	bert-base-uncased, roberta-base	Predicts multiple labels per input simultaneously (Multi-Hot Vector). Uses sigmoid activation and thresholding instead of softmax. Suitable when labels are non-exclusive and may overlap.
Multi-class Classification	Encoder	Intent classification, document categorization, sentiment analysis	bert-base-uncased, roberta-base	Outputs a one-hot vector with a single active class. Uses softmax activation and cross-entropy loss. Suitable when classes are mutually exclusive.

Base vs. Instruct Model Selection

Base Models: Trained on raw text for next-token prediction. These serve as the foundation when introducing a model to entirely new languages or highly specialized technical vocabularies.
Instruct Models: Pre-aligned to follow directions. These are preferable for refining specific behaviors, adjusting response tones, or enforcing strict output formats (e.g., JSON).

Parameter Size and Hardware Requirements

Model size influences the use case.

Model Size	Optimal Use Case
1B - 3B	Edge devices, mobile applications, simple classification.
7B - 8B	General reasoning, logic tasks, and standard coding assistance.
14B - 30B	Complex domain-specific logic (medical, legal, or scientific).

Model size, measured in billions of parameters (B), dictates the required VRAM to fine-tune the model.

Model Parameters	QLoRA (4-bit) VRAM	LoRA (16-bit) VRAM
3B	~3.5 GB	~8 GB
7B	~5 GB	~19 GB
8B	~6 GB	~22 GB
9B	~6.5 GB	~24 GB
11B	~7.5 GB	~29 GB
14B	~8.5 GB	~33 GB
27B	~22 GB	~64 GB
32B	~26 GB	~76 GB

tip

Implementation of LoRA (Low-Rank Adaptation) or QLoRA enables the fine-tuning of 7B+ models on consumer-grade hardware, such as a single 24GB GPU.

Selection Criteria

Key metrics on model repositories provide essential guidance:

Context Window: Determines the maximum data volume processed in a single pass. Long-form document analysis requires models with 32k to 128k context windows. Larger context windows significantly increase VRAM usage, as memory consumption scales with sequence length due to attention key/value caches, directly impacting batch size, concurrency, and hardware requirements.

tip

Protean AI provides deployment recommendation and suggests VRAM usage based on context window size. Protean AI also allows the possibility to configure the context window size during the deployment.

Licensing: Compliance with commercial requirements (e.g., Apache 2.0, MIT, or Llama 3 Community License) is mandatory for enterprise deployment.
Benchmarks: MMLU (General knowledge) and HumanEval (Coding) scores offer standardized performance comparisons.

Implementation Checklist

Identify Task Category:
Audit Hardware:
Select Scale: Determine if a 7B/8B model provides the necessary balance of speed and intelligence.
Verify Licensing: Ensure the model permits the intended commercial or research use.
Baseline Testing: Evaluate the model's performance without fine-tuning to establish a performance floor.

Objective Definition​

Base vs. Instruct Model Selection​

Parameter Size and Hardware Requirements​

Selection Criteria​

Implementation Checklist​

Objective Definition

Base vs. Instruct Model Selection

Parameter Size and Hardware Requirements

Selection Criteria

Implementation Checklist