Skip to main content

Hyperparameter Search

This guide explains the automation of the "trial and error" process for finding optimal model settings. Instead of manual guessing, a range of possibilities (a "search space") is defined, allowing Protean AI to intelligently test combinations to find the most effective configuration.

In the process, the best settings for a specific dataset are identified. Trials that do not yield the desired results are pruned (stopped early), saving both time and resources.

What is a "Search Space"?

A search space functions as a menu of options for the search algorithm to choose from. Rather than providing a single fixed value, limits and boundaries are established for the system to explore.

  • Fixed Value: A static value used for every trial.
  • Range: Any number between a defined X and Y is selected.
  • Log Range: A number is selected between X and Y, with a focus on smaller magnitudes (e.g., 0.0001) rather than larger ones.
  • Categorical: A single option is selected from a specific list: [Option A, Option B, Option C].

Configurable Parameters

The following section details how to configure the search for specific settings.

Learning Rate

  • Description: The speed at which the model learns.
  • Best Search Strategy: Logarithmic Range.
    • Reasoning: The difference between 0.0001 and 0.001 is significant (10x), whereas the difference between 0.1 and 0.101 is negligible. A "log" search treats 10x magnitudes equally.
  • Recommended Range: 1e-6 (very slow) to 2e-4 (standard).

Weight Decay

  • Description: A stabilizer used to prevent overfitting.
  • Best Search Strategy: Categorical or Log Range.
    • Reasoning: Typically, the desired value is either "None" (0.0), "Standard" (0.01), or "High" (0.1).
  • Recommended Options: [0.0, 0.01, 0.1]

Warmup Steps

  • Description: The "warm-up" period at the beginning of training.
  • Best Search Strategy: Integer Range with a step.
    • Reasoning: Testing every single step (e.g., 11 vs. 12) is unnecessary. Intervals such as 10, 20, or 30 are sufficient.
  • Recommended Range: 0 to 100 (Step size: 10).

LoRA Rank

  • Description: The "brain capacity" of the adapter.
  • Best Search Strategy: Categorical.
    • Reasoning: Computational efficiency is optimized for powers of 2 (8, 16, 32, 64). Testing arbitrary values like "Rank 13" is generally inefficient.
  • Recommended Options: [8, 16, 32, 64]

LoRA Alpha

  • Description: The "loudness" or strength of the adapter.
  • Best Search Strategy: Categorical (Dependent on Rank).
    • Note: A common rule is Alpha = 2 * Rank. However, if independent search is required, a list of standard values can be used.
  • Recommended Options: [16, 32, 64, 128]

LoRA Dropout

  • Description: The random disabling of neurons to improve reliability.
  • Best Search Strategy: Float Range (Linear).
    • Reasoning: This represents a simple percentage from 0% to 10%.
  • Recommended Range: 0.0 to 0.1 (Step: 0.05).

Quick "Copy-Paste" Search Space

The following values serve as a reliable starting point for chat objective on instruction tuned models.

ParameterTypeSuggested Range/Values
Learning RateLog Float1e-5 ... 2e-4 (log)
Weight DecayCategorical0.0, 0.01, 0.1
Warmup StepsInt (Step)0 ... 100 (step=10)
Lora RCategorical8, 16, 32, 64
Lora AlphaCategorical16, 32, 64
Lora DropoutFloat0.0 ... 0.1 (Step: 0.05)