jartine commited on
Commit
b0c4e54
1 Parent(s): f85f851

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -7
README.md CHANGED
@@ -55,12 +55,8 @@ chmod +x Meta-Llama-3.1-8B-Instruct.Q6_K.llamafile
55
  You then need to fill out the prompt / history template (see below).
56
 
57
  This model has a max context window size of 128k tokens. By default, a
58
- context window size of 512 tokens is used. You can use a larger context
59
- window by passing the `-c 8192` flag. The software currently has
60
- limitations that may prevent scaling to the full 128k size. See our
61
- [Phi-3-medium-128k-instruct-llamafile](https://huggingface.co/Mozilla/Phi-3-medium-128k-instruct-llamafile)
62
- repository for llamafiles that are known to work with a 128kb context
63
- size.
64
 
65
  On GPUs with sufficient RAM, the `-ngl 999` flag may be passed to use
66
  the system's NVIDIA or AMD GPU(s). On Windows, only the graphics card
@@ -72,7 +68,7 @@ For further information, please see the [llamafile
72
  README](https://github.com/mozilla-ocho/llamafile/).
73
 
74
  Having **trouble?** See the ["Gotchas"
75
- section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas)
76
  of the README.
77
 
78
  ## Prompting
 
55
  You then need to fill out the prompt / history template (see below).
56
 
57
  This model has a max context window size of 128k tokens. By default, a
58
+ context window size of 8192 tokens is used. You can use a larger context
59
+ window by passing the `-c 131072` flag.
 
 
 
 
60
 
61
  On GPUs with sufficient RAM, the `-ngl 999` flag may be passed to use
62
  the system's NVIDIA or AMD GPU(s). On Windows, only the graphics card
 
68
  README](https://github.com/mozilla-ocho/llamafile/).
69
 
70
  Having **trouble?** See the ["Gotchas"
71
+ section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas-and-troubleshooting)
72
  of the README.
73
 
74
  ## Prompting