Developer's Corner

You want to understand how this package works or modify the code you are running? Here the necessary tools are provided and explained.

Tokenizer

Llama2.TokenizerType
Tokenizer

Construct a tokenizer storing vocabulary entries, scores, and byte-piece mappings.

Constructors

  • Tokenizer(vocab, vocab_scores, sorted_vocab, vocab_size, max_token_length, byte_pieces) Construct a tokenizer directly from the provided fields. Validate that max_token_length > 0 and that byte_pieces has length 256.

  • Tokenizer(path::String, vocab_size::Integer) Load a tokenizer from a binary file.

Fields

  • vocab: Token string sequences.
  • vocab_scores: Scores for each token.
  • sorted_vocab: Sorted token indices.
  • vocab_size: Number of vocabulary entries.
  • max_token_length: Maximum token length in bytes.
  • byte_pieces: Byte mapping (length 256).

Tokenizer(path::String, vocab_size::Integer) -> Tokenizer

Load a tokenizer from a binary file.

Read vocabulary, token scores, and metadata from the binary file at path. The file format expects:

  • Int32: maximum token length
  • For each of vocab_size tokens:
    • Float32: token score
    • Int32: string length (n)
    • n bytes: token string (UTF-8)

Arguments

  • path::String: Path to the tokenizer binary file.
  • vocab_size::Integer: Expected number of vocabulary entries.

Returns

  • Tokenizer: A tokenizer ready for encoding/decoding text.
source
Llama2.TokenIndexType
TokenIndex(str::String, id::Integer)

A vocabulary token with its string representation and numeric identifier.

Stores a token string and its associated ID for efficient sorting and lookup in the tokenizer vocabulary.

Throw a DomainError if id ≤ 0.

Arguments

  • str::String: Token string (e.g., "Julia").
  • id::Integer: Token identifier (converted to Int16, must be ≥ 0).

Examples

julia> Llama2.TokenIndex("Julia", 1)
Llama2.TokenIndex("Julia", 1)

julia> Llama2.TokenIndex("Julia", -1)
ERROR: DomainError with Token index must be > 0.
[...]

Developer Notes

This is an internal struct.

source
Base.islessMethod
Base.isless(first_token::TokenIndex, second_token::TokenIndex) -> Bool

Compare two tokens lexicographically by their string values.

Return true if first_token.str < second_token.str in lexicographic order. Intended for use as the lt argument to sorting functions.

Examples

julia> isless(Llama2.TokenIndex("A", 1), Llama2.TokenIndex("B", 2))
true

julia> isless(Llama2.TokenIndex("B", 1), Llama2.TokenIndex("A", 2))
false
source
Llama2.str_lookupFunction
str_lookup(str::String, sorted_vocab::Vector{TokenIndex}) -> Int16

Search for str within a sorted vocabulary sorted_vocab. If a match is found, it returns the corresponding token ID; otherwise, it returns -1. It uses a binary search for efficient lookup.

Arguments

  • str::String: Token string to search for.
  • sorted_vocab::Vector{TokenIndex}: Vocabulary sorted lexicographically by string.

Returns

  • Int16: Token ID if found, -1 otherwise.

Examples

julia> Llama2.str_lookup("aa", [Llama2.TokenIndex("aa", 1), Llama2.TokenIndex("bb", 2)])
1

julia> Llama2.str_lookup("ba", [Llama2.TokenIndex("aa", 1), Llama2.TokenIndex("bb", 2)])
-1
source
Llama2.encodeFunction
encode(tokenizer::Tokenizer, text::String) -> Vector{Integer}

Converts text into a sequence of token IDs using tokenizer. First ensure the tokenizer's vocabulary is sorted, then encode each character into its corresponding ID. After that, iteratively merge token pairs with the highest scores to form longer tokens until no more merges are possible. Return the final token ID sequence.

Arguments

  • tokenizer::Tokenizer: The tokenizer with vocabulary and merge scores.
  • text::String: Input text to encode.

Returns

  • Vector{Integer}: Sequence of token IDs representing the encoded text.
source

Transformer

Llama2.TransformerType
Transformer(config::Config, weights::TransformerWeights)

Create a Transformer with data from config and weights. The RunState containers are initialized empty and just are assigned the corresponding dimensions.

Developer Notes

This is an internal struct.

source
Llama2.ConfigType
Config

Create a Config containing 7 Int32. These describe meta-data to read values from an input file.

Developer Notes

This is an internal struct.

source
Llama2.TransformerWeightsType
TransformerWeights

Create a TransformerWeights containing several Float32 containers. These describe actual weight data that is loaded from an input file.

Developer Notes

This is an internal struct.

source
Llama2.RunStateType
RunState

Create a RunState containing several Float32 containers. These reflect the state of the Transformer at run-time.

Developer Notes

This is an internal struct.

source

forward!

Llama2.forward!Function
forward!(transformer::Transformer, token::Int32, pos::Int32)

Perform a single forward pass through the transformer.

Compute logits for the next token prediction given the current token at position pos in the sequence. Updates the internal KV-cache in transformer.state with keys and values from this forward pass.

Arguments

  • transformer::Transformer: The model (modified in-place via KV-cache updates).
  • token::Int32: Current input token index (must be in range 1:vocab_size).
  • pos::Int32: Position in the sequence (1-indexed, must be ≤ seq_len).

Returns

  • Vector{Float32}: Logits over the vocabulary for next token prediction (length = vocab_size).

Examples

model = Transformer("model.bin")
token = Int32(1)  # BOS token
pos = Int32(1)

logits = forward!(model, token, pos)
next_token = argmax(logits)
source
Llama2.rmsnormFunction
rmsnorm(x, w) -> Vector{Float32}

Calculate the rmsnorm of x and w, the scaled product 'λw * x'.

Arguments

  • x::AbstractVector{Float32}: Input vector to normalize.
  • w::AbstractVector{Float32}: Scale weights (must have same length as x).

Returns

  • Vector{Float32}: Normalized and scaled output.

Examples

julia>  x = [1.0f0,2,3];

julia>  w = [1.0f0,1,1];

julia> o = Llama2.rmsnorm(x, w) 
3-element Vector{Float32}:
 0.46290955
 0.9258191
 1.3887286
source
Llama2.softmax!Function
softmax!(x) -> Vector{Float32}

Updates the output of an layer 'x' with the softmax! of the input.

Transform logits into a probability distribution by exponentiating and normalizing. Uses the numerically stable formulation: x[i] = exp(x[i] - max(x)) / sum(exp(x .- max(x))).

The input vector is modified in-place and also returned.

Arguments

  • x::AbstractVector{Float32}: Logits to transform (modified in-place).

Returns

  • Vector{Float32}: The same vector x, now containing probabilities that sum to 1.

Examples

julia> x = [-1.0f0,0,1];

julia> Llama2.softmax!(x);

julia> x
3-element Vector{Float32}:
 0.09003057
 0.24472848
 0.66524094
source

Sampler

Llama2.SamplerType
Sampler

Stateful sampler for converting model logits into token indices.

Encapsulate sampling configuration (temperature, nucleus sampling) and internal buffers needed for efficient token sampling.

Constructors

  • Sampler(vocab_size::Int32, temperature::Float32, topp::Float32, rng_seed::Int128) Construct a sampler with automatically allocated internal buffers for nucleus (top-p) sampling.

  • Sampler(vocab_size::Int32, probindex::Vector{ProbIndex}, temperature::Float32, topp::Float32, rng_state::Int128) Construct a sampler using a caller-provided probindex workspace buffer. The buffer must have length at least vocab_size.

Fields

  • vocab_size::Int32: Vocabulary size
  • temperature::Float32: Sampling temperature (0 = greedy)
  • topp::Float32: Nucleus sampling threshold
  • rng_state::Int128: Random number generator state

Sampler is callable and can be applied to a vector of logits to obtain the next token index.

source
Llama2.ProbIndexType
ProbIndex

Create a ProbIndex from an AbstractFloat and an Integer.

prob contains a proberbility and is stored as an Int32 and index is stored as a Float32. Throw a DomainError if index < 0.

Examples

julia> Llama2.ProbIndex(1.0, 2)
ProbIndex(1.0f0, 2)

julia> ProbIndex(1.0, -1)
ERROR: DomainError with Prob index must be > 0.:
[...]
source
Base.islessMethod
isless(a::ProbIndex, b::ProbIndex) -> Bool

Comparison function for ordering ProbIndex values by probability.

Return true if a.prob < b.prob. Intended for use as the lt argument to sorting routines.

source
Llama2.sample_multFunction
sample_mult(probabilities, coin) -> Int

Sample an index from a multinomial distribution.

Given a vector of normalized probabilities and a uniform random number coin ∈ [0, 1), returns the first index whose cumulative probability exceeds coin.

Arguments

  • probabilities::Vector{Float32}: Probability mass function (must sum to 1).
  • coin::Float32: Uniform random number in [0, 1).

Returns

  • Index of the sampled element.

If numerical roundoff prevents an early return, the last index is returned.

source
Llama2.sample_toppFunction
sample_topp(probabilities, topp, probindex, coin) -> Int

Sample an index using nucleus (top-p) sampling.

Selects the smallest set of tokens whose cumulative probability mass exceeds topp, then samples from this restricted distribution using the provided random number.

Arguments

  • probabilities::Vector{Float32}: Normalized probability distribution.
  • topp::Float32: Cumulative probability threshold (0 < topp < 1).
  • probindex::Vector{ProbIndex}: Preallocated workspace for sorting and indexing candidate tokens.
  • coin::Float32: Uniform random number in [0, 1).

Returns

  • Index of the sampled token.

The probindex buffer is mutated and reused to avoid allocations.

source