Developer's Corner
You want to understand how this package works or modify the code you are running? Here the necessary tools are provided and explained.
Tokenizer
Llama2.Tokenizer — Type
TokenizerConstruct a tokenizer storing vocabulary entries, scores, and byte-piece mappings.
Constructors
Tokenizer(vocab, vocab_scores, sorted_vocab, vocab_size, max_token_length, byte_pieces)Construct a tokenizer directly from the provided fields. Validate thatmax_token_length > 0and thatbyte_pieceshas length 256.Tokenizer(path::String, vocab_size::Integer)Load a tokenizer from a binary file.
Fields
vocab: Token string sequences.vocab_scores: Scores for each token.sorted_vocab: Sorted token indices.vocab_size: Number of vocabulary entries.max_token_length: Maximum token length in bytes.byte_pieces: Byte mapping (length 256).
Tokenizer(path::String, vocab_size::Integer) -> TokenizerLoad a tokenizer from a binary file.
Read vocabulary, token scores, and metadata from the binary file at path. The file format expects:
- Int32: maximum token length
- For each of
vocab_sizetokens:- Float32: token score
- Int32: string length (n)
- n bytes: token string (UTF-8)
Arguments
path::String: Path to the tokenizer binary file.vocab_size::Integer: Expected number of vocabulary entries.
Returns
Tokenizer: A tokenizer ready for encoding/decoding text.
Llama2.TokenIndex — Type
TokenIndex(str::String, id::Integer)A vocabulary token with its string representation and numeric identifier.
Stores a token string and its associated ID for efficient sorting and lookup in the tokenizer vocabulary.
Throw a DomainError if id ≤ 0.
Arguments
str::String: Token string (e.g., "Julia").id::Integer: Token identifier (converted toInt16, must be ≥ 0).
Examples
julia> Llama2.TokenIndex("Julia", 1)
Llama2.TokenIndex("Julia", 1)
julia> Llama2.TokenIndex("Julia", -1)
ERROR: DomainError with Token index must be > 0.
[...]Developer Notes
This is an internal struct.
Base.isless — Method
Base.isless(first_token::TokenIndex, second_token::TokenIndex) -> BoolCompare two tokens lexicographically by their string values.
Return true if first_token.str < second_token.str in lexicographic order. Intended for use as the lt argument to sorting functions.
Examples
julia> isless(Llama2.TokenIndex("A", 1), Llama2.TokenIndex("B", 2))
true
julia> isless(Llama2.TokenIndex("B", 1), Llama2.TokenIndex("A", 2))
falseLlama2.str_lookup — Function
str_lookup(str::String, sorted_vocab::Vector{TokenIndex}) -> Int16Search for str within a sorted vocabulary sorted_vocab. If a match is found, it returns the corresponding token ID; otherwise, it returns -1. It uses a binary search for efficient lookup.
Arguments
str::String: Token string to search for.sorted_vocab::Vector{TokenIndex}: Vocabulary sorted lexicographically by string.
Returns
Int16: Token ID if found,-1otherwise.
Examples
julia> Llama2.str_lookup("aa", [Llama2.TokenIndex("aa", 1), Llama2.TokenIndex("bb", 2)])
1
julia> Llama2.str_lookup("ba", [Llama2.TokenIndex("aa", 1), Llama2.TokenIndex("bb", 2)])
-1Llama2.encode — Function
encode(tokenizer::Tokenizer, text::String) -> Vector{Integer}Converts text into a sequence of token IDs using tokenizer. First ensure the tokenizer's vocabulary is sorted, then encode each character into its corresponding ID. After that, iteratively merge token pairs with the highest scores to form longer tokens until no more merges are possible. Return the final token ID sequence.
Arguments
tokenizer::Tokenizer: The tokenizer with vocabulary and merge scores.text::String: Input text to encode.
Returns
Vector{Integer}: Sequence of token IDs representing the encoded text.
Transformer
Llama2.Transformer — Type
Transformer(config::Config, weights::TransformerWeights)Create a Transformer with data from config and weights. The RunState containers are initialized empty and just are assigned the corresponding dimensions.
Developer Notes
This is an internal struct.
Llama2.Config — Type
ConfigCreate a Config containing 7 Int32. These describe meta-data to read values from an input file.
Developer Notes
This is an internal struct.
Llama2.TransformerWeights — Type
TransformerWeightsCreate a TransformerWeights containing several Float32 containers. These describe actual weight data that is loaded from an input file.
Developer Notes
This is an internal struct.
Llama2.RunState — Type
RunStateCreate a RunState containing several Float32 containers. These reflect the state of the Transformer at run-time.
Developer Notes
This is an internal struct.
forward!
Llama2.forward! — Function
forward!(transformer::Transformer, token::Int32, pos::Int32)Perform a single forward pass through the transformer.
Compute logits for the next token prediction given the current token at position pos in the sequence. Updates the internal KV-cache in transformer.state with keys and values from this forward pass.
Arguments
transformer::Transformer: The model (modified in-place via KV-cache updates).token::Int32: Current input token index (must be in range1:vocab_size).pos::Int32: Position in the sequence (1-indexed, must be ≤seq_len).
Returns
Vector{Float32}: Logits over the vocabulary for next token prediction (length =vocab_size).
Examples
model = Transformer("model.bin")
token = Int32(1) # BOS token
pos = Int32(1)
logits = forward!(model, token, pos)
next_token = argmax(logits)Llama2.rmsnorm — Function
rmsnorm(x, w) -> Vector{Float32}Calculate the rmsnorm of x and w, the scaled product 'λw * x'.
Arguments
x::AbstractVector{Float32}: Input vector to normalize.w::AbstractVector{Float32}: Scale weights (must have same length asx).
Returns
Vector{Float32}: Normalized and scaled output.
Examples
julia> x = [1.0f0,2,3];
julia> w = [1.0f0,1,1];
julia> o = Llama2.rmsnorm(x, w)
3-element Vector{Float32}:
0.46290955
0.9258191
1.3887286Llama2.softmax! — Function
softmax!(x) -> Vector{Float32}Updates the output of an layer 'x' with the softmax! of the input.
Transform logits into a probability distribution by exponentiating and normalizing. Uses the numerically stable formulation: x[i] = exp(x[i] - max(x)) / sum(exp(x .- max(x))).
The input vector is modified in-place and also returned.
Arguments
x::AbstractVector{Float32}: Logits to transform (modified in-place).
Returns
Vector{Float32}: The same vectorx, now containing probabilities that sum to 1.
Examples
julia> x = [-1.0f0,0,1];
julia> Llama2.softmax!(x);
julia> x
3-element Vector{Float32}:
0.09003057
0.24472848
0.66524094Sampler
Llama2.Sampler — Type
SamplerStateful sampler for converting model logits into token indices.
Encapsulate sampling configuration (temperature, nucleus sampling) and internal buffers needed for efficient token sampling.
Constructors
Sampler(vocab_size::Int32, temperature::Float32, topp::Float32, rng_seed::Int128)Construct a sampler with automatically allocated internal buffers for nucleus (top-p) sampling.Sampler(vocab_size::Int32, probindex::Vector{ProbIndex}, temperature::Float32, topp::Float32, rng_state::Int128)Construct a sampler using a caller-providedprobindexworkspace buffer. The buffer must have length at leastvocab_size.
Fields
vocab_size::Int32: Vocabulary sizetemperature::Float32: Sampling temperature (0 = greedy)topp::Float32: Nucleus sampling thresholdrng_state::Int128: Random number generator state
Sampler is callable and can be applied to a vector of logits to obtain the next token index.
Llama2.ProbIndex — Type
ProbIndexCreate a ProbIndex from an AbstractFloat and an Integer.
prob contains a proberbility and is stored as an Int32 and index is stored as a Float32. Throw a DomainError if index < 0.
Examples
julia> Llama2.ProbIndex(1.0, 2)
ProbIndex(1.0f0, 2)
julia> ProbIndex(1.0, -1)
ERROR: DomainError with Prob index must be > 0.:
[...]Base.isless — Method
isless(a::ProbIndex, b::ProbIndex) -> BoolComparison function for ordering ProbIndex values by probability.
Return true if a.prob < b.prob. Intended for use as the lt argument to sorting routines.
Llama2.sample_mult — Function
sample_mult(probabilities, coin) -> IntSample an index from a multinomial distribution.
Given a vector of normalized probabilities and a uniform random number coin ∈ [0, 1), returns the first index whose cumulative probability exceeds coin.
Arguments
probabilities::Vector{Float32}: Probability mass function (must sum to 1).coin::Float32: Uniform random number in[0, 1).
Returns
- Index of the sampled element.
If numerical roundoff prevents an early return, the last index is returned.
Llama2.sample_topp — Function
sample_topp(probabilities, topp, probindex, coin) -> IntSample an index using nucleus (top-p) sampling.
Selects the smallest set of tokens whose cumulative probability mass exceeds topp, then samples from this restricted distribution using the provided random number.
Arguments
probabilities::Vector{Float32}: Normalized probability distribution.topp::Float32: Cumulative probability threshold (0 < topp < 1).probindex::Vector{ProbIndex}: Preallocated workspace for sorting and indexing candidate tokens.coin::Float32: Uniform random number in[0, 1).
Returns
- Index of the sampled token.
The probindex buffer is mutated and reused to avoid allocations.