Hyperparameter Search

This guide explains the automation of the "trial and error" process for finding optimal model settings. Instead of manual guessing, a range of possibilities (a "search space") is defined, allowing Protean AI to intelligently test combinations to find the most effective configuration.

In the process, the best settings for a specific dataset are identified. Trials that do not yield the desired results are pruned (stopped early), saving both time and resources.

What is a "Search Space"?

A search space functions as a menu of options for the search algorithm to choose from. Rather than providing a single fixed value, limits and boundaries are established for the system to explore.

Fixed Value: A static value used for every trial.
Range: Any number between a defined X and Y is selected.
Log Range: A number is selected between X and Y, with a focus on smaller magnitudes (e.g., 0.0001) rather than larger ones.
Categorical: A single option is selected from a specific list: [Option A, Option B, Option C].

Configurable Parameters

The following section details how to configure the search for specific settings.

Learning Rate

Description: The speed at which the model learns.
Best Search Strategy: Logarithmic Range.
- Reasoning: The difference between 0.0001 and 0.001 is significant (10x), whereas the difference between 0.1 and 0.101 is negligible. A "log" search treats 10x magnitudes equally.
Recommended Range: 1e-6 (very slow) to 2e-4 (standard).

Weight Decay

Description: A stabilizer used to prevent overfitting.
Best Search Strategy: Categorical or Log Range.
- Reasoning: Typically, the desired value is either "None" (0.0), "Standard" (0.01), or "High" (0.1).
Recommended Options: [0.0, 0.01, 0.1]

Warmup Steps

Description: The "warm-up" period at the beginning of training.
Best Search Strategy: Integer Range with a step.
- Reasoning: Testing every single step (e.g., 11 vs. 12) is unnecessary. Intervals such as 10, 20, or 30 are sufficient.
Recommended Range: 0 to 100 (Step size: 10).

LoRA Rank

Description: The "brain capacity" of the adapter.
Best Search Strategy: Categorical.
- Reasoning: Computational efficiency is optimized for powers of 2 (8, 16, 32, 64). Testing arbitrary values like "Rank 13" is generally inefficient.
Recommended Options: [8, 16, 32, 64]

LoRA Alpha

Description: The "loudness" or strength of the adapter.
Best Search Strategy: Categorical (Dependent on Rank).
- Note: A common rule is Alpha = 2 * Rank. However, if independent search is required, a list of standard values can be used.
Recommended Options: [16, 32, 64, 128]

LoRA Dropout

Description: The random disabling of neurons to improve reliability.
Best Search Strategy: Float Range (Linear).
- Reasoning: This represents a simple percentage from 0% to 10%.
Recommended Range: 0.0 to 0.1 (Step: 0.05).

Quick "Copy-Paste" Search Space

The following values serve as a reliable starting point for chat objective on instruction tuned models.

Parameter	Type	Suggested Range/Values
Learning Rate	`Log Float`	`1e-5` ... `2e-4` (log)
Weight Decay	`Categorical`	`0.0`, `0.01`, `0.1`
Warmup Steps	`Int (Step)`	`0` ... `100` (step=10)
Lora R	`Categorical`	`8`, `16`, `32`, `64`
Lora Alpha	`Categorical`	`16`, `32`, `64`
Lora Dropout	`Float`	`0.0` ... `0.1` (Step: 0.05)

What is a "Search Space"?​

Configurable Parameters​

Learning Rate​

Weight Decay​

Warmup Steps​

LoRA Rank​

LoRA Alpha​

LoRA Dropout​

Quick "Copy-Paste" Search Space​