Hyperparameter Search
This guide explains the automation of the "trial and error" process for finding optimal model settings. Instead of manual guessing, a range of possibilities (a "search space") is defined, allowing Protean AI to intelligently test combinations to find the most effective configuration.
In the process, the best settings for a specific dataset are identified. Trials that do not yield the desired results are pruned (stopped early), saving both time and resources.
What is a "Search Space"?
A search space functions as a menu of options for the search algorithm to choose from. Rather than providing a single fixed value, limits and boundaries are established for the system to explore.
- Fixed Value: A static value used for every trial.
- Range: Any number between a defined X and Y is selected.
- Log Range: A number is selected between X and Y, with a focus on smaller magnitudes (e.g., 0.0001) rather than larger ones.
- Categorical: A single option is selected from a specific list:
[Option A, Option B, Option C].
Configurable Parameters
The following section details how to configure the search for specific settings.
Learning Rate
- Description: The speed at which the model learns.
- Best Search Strategy: Logarithmic Range.
- Reasoning: The difference between
0.0001and0.001is significant (10x), whereas the difference between0.1and0.101is negligible. A "log" search treats 10x magnitudes equally.
- Reasoning: The difference between
- Recommended Range:
1e-6(very slow) to2e-4(standard).
Weight Decay
- Description: A stabilizer used to prevent overfitting.
- Best Search Strategy: Categorical or Log Range.
- Reasoning: Typically, the desired value is either "None" (0.0), "Standard" (0.01), or "High" (0.1).
- Recommended Options:
[0.0, 0.01, 0.1]
Warmup Steps
- Description: The "warm-up" period at the beginning of training.
- Best Search Strategy: Integer Range with a step.
- Reasoning: Testing every single step (e.g., 11 vs. 12) is unnecessary. Intervals such as 10, 20, or 30 are sufficient.
- Recommended Range:
0to100(Step size: 10).
LoRA Rank
- Description: The "brain capacity" of the adapter.
- Best Search Strategy: Categorical.
- Reasoning: Computational efficiency is optimized for powers of 2 (8, 16, 32, 64). Testing arbitrary values like "Rank 13" is generally inefficient.
- Recommended Options:
[8, 16, 32, 64]
LoRA Alpha
- Description: The "loudness" or strength of the adapter.
- Best Search Strategy: Categorical (Dependent on Rank).
- Note: A common rule is
Alpha = 2 * Rank. However, if independent search is required, a list of standard values can be used.
- Note: A common rule is
- Recommended Options:
[16, 32, 64, 128]
LoRA Dropout
- Description: The random disabling of neurons to improve reliability.
- Best Search Strategy: Float Range (Linear).
- Reasoning: This represents a simple percentage from 0% to 10%.
- Recommended Range:
0.0to0.1(Step: 0.05).
Quick "Copy-Paste" Search Space
The following values serve as a reliable starting point for chat objective on instruction tuned models.
| Parameter | Type | Suggested Range/Values |
|---|---|---|
| Learning Rate | Log Float | 1e-5 ... 2e-4 (log) |
| Weight Decay | Categorical | 0.0, 0.01, 0.1 |
| Warmup Steps | Int (Step) | 0 ... 100 (step=10) |
| Lora R | Categorical | 8, 16, 32, 64 |
| Lora Alpha | Categorical | 16, 32, 64 |
| Lora Dropout | Float | 0.0 ... 0.1 (Step: 0.05) |