Llama2.jl
What is Llama2?
LLama2 is a family of pre-trained LLMs by Meta AI. More information can be found at: https://www.llama.com/
What is Llama2.jl?
Llama2.jl can inference a given model from within julia. For this cause you will have to provide your own model checkpoint. This project follows the procedure outlined by the run.c file from llama2.c.
Getting started
Clone the repository to a desired location:
cd /PATH/TO/DESIRED/LOCATION/
git clone git@github.com:ConstantConstantin/Llama2.gitStart julia, activate a desired environment and add the package; it can then be loaded in your session:
(@v1.11) pkg> activate .
Activating new project at `PATH/TO/MY/ENVIRONMENT/myLlama2`
(myLlama2) pkg> add /PATH/TO/DESIRED/LOCATION/Llama2/
Cloning git-repo `/PATH/TO/DESIRED/LOCATION/Llama2`
Updating git-repo `/PATH/TO/DESIRED/LOCATION/Llama2`
Updating registry at `~/.julia/registries/General.toml`
Resolving package versions...
Updating `PATH/TO/MY/ENVIRONMENT/Project.toml`
[0e620e9f] + Llama2 v1.0.0-DEV `/PATH/TO/DESIRED/LOCATION/Llama2#aj/docs`
Updating `PATH/TO/MY/ENVIRONMENT/Manifest.toml`
[0e620e9f] + Llama2 v1.0.0-DEV `/PATH/TO/DESIRED/LOCATION/Llama2#aj/docs`
Precompiling project...
1 dependency successfully precompiled in 1 seconds
julia> using Llama2Llama2.ConfigLlama2.RunStateLlama2.TokenIndexLlama2.TokenizerLlama2.TransformerLlama2.TransformerWeightsLlama2.compare_tokensLlama2.encodeLlama2.str_lookup
Llama2.Config — Type
ConfigCreate a Config containing 7 Int32. These describe meta-data to read values from an input file.
Developer Notes
This is an internal struct.
Llama2.RunState — Type
RunStateCreate a RunState containing several Float32 containers. These reflect the state of the Transformer at run-time.
Developer Notes
This is an internal struct.
Llama2.TokenIndex — Type
TokenIndex(str::String, id::Integer)Create a TokenIndex from a string and an integer identifier.
The byte sequence is converted to String and the ID is converted to Int16. Throw a DomainError if id ≤ 0.
Examples
julia> using Llama2;
julia> TokenIndex("Julia", 1)
TokenIndex("Julia", 1)
julia> TokenIndex("Julia", -1)
ERROR: DomainError with Token index must be > 0.
[...]Developer Notes
This is an internal struct.
Llama2.Tokenizer — Type
TokenizerConstruct a tokenizer storing vocabulary entries, scores, and byte-piece mappings.
Constructors
Tokenizer(vocab, vocab_scores, sorted_vocab, vocab_size, max_token_length, byte_pieces)Construct a tokenizer directly from the provided fields. Validate thatmax_token_length > 0and thatbyte_pieceshas length 256.Tokenizer(path::String, vocab_size::Integer)Load a tokenizer from a binary file.
Fields
vocab: Token string sequences.vocab_scores: Scores for each token.sorted_vocab: Sorted token indices.vocab_size: Number of vocabulary entries.max_token_length: Maximum token length in bytes.byte_pieces: Byte mapping (length 256).
Llama2.Transformer — Method
Transformer(path::String)Load a binary file with location path and construct a Transformer from its content. The file is expected to have a header of 7 Int32 values followed by Float32 data.
Example
julia> t = Llama2.Transformer("/PATH/TO/YOUR.bin");Llama2.TransformerWeights — Type
TransformerWeightsCreate a TransformerWeights containing several Float32 containers. These describe actual weight data that is loaded from an input file.
Developer Notes
This is an internal struct.
Llama2.compare_tokens — Method
compare_tokens(first_token::TokenIndex, second_token::TokenIndex) -> BoolCompare two TokenIndex objects by their string values. It returns true if the first token's string is lexicographically less than the second's, and false otherwise.
Examples
julia> using Llama2;
julia> compare_tokens(TokenIndex("A", 1), TokenIndex("B", 2))
true
julia> compare_tokens(TokenIndex("B", 1), TokenIndex("A", 2))
falseLlama2.encode — Method
encodeConverts a string text into a sequence of token IDs using a Tokenizer. First ensure the tokenizer's vocabulary is sorted, then encode each character into its corresponding ID. After that, iteratively merge token pairs with the highest scores to form longer tokens until no more merges are possible. Return the final token ID sequence.
Llama2.str_lookup — Method
str_lookup(str::String, sorted_vocab::Vector{TokenIndex}) -> Int16Search for a given string str within a sorted vocabulary sorted_vocab of TokenIndex objects. If the string is found, it returns the corresponding token ID; otherwise, it returns -1. It uses a binary search for efficient lookup.
Examples
julia> using Llama2;
julia> str_lookup("aa", [TokenIndex("aa", 1), TokenIndex("bb", 2)])
1
julia> str_lookup("ba", [TokenIndex("aa", 1), TokenIndex("bb", 2)])
-1