You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: llm_bench/README.md
+10-1
Original file line number
Diff line number
Diff line change
@@ -63,7 +63,6 @@ Generation options:
63
63
-`--chat`: specify to call chat API instead of raw completions
64
64
-`--stream`: stream the result back. Enabling this gives "time to first token" and "time per token" metrics
65
65
- (optional) `--logprobs`: corresponds to `logprobs` API parameter. For some providers, it's needed for output token counting in streaming mode.
66
-
-`--max-tokens-jitter`: how much to adjust randomly the setting of `-o` at each request. When using "fixed concurrency" mode it's useful to avoid all workers implicitly synchronizing and causing periodic traffic bursts.
67
66
68
67
### Writing results
69
68
@@ -76,6 +75,16 @@ When comparing multiple configurations, it's useful to aggregate results togethe
76
75
77
76
The typical workflow would be to run benchmark several times appending to the same CSV file. The resulting file can be imported into a spreadsheet or pandas for further analysis.
78
77
78
+
### Custom prompts
79
+
80
+
Sometimes it's necessary to replay exact prompts, for example in the case of embedding images.
81
+
`--prompt-text` option can be used in this case to specify a file with .jsonl extension (starting with an ampersand, e.g. `@prompt.jsonl`.).
82
+
jsonl files will be read line-by-line and will be randomly chosen for each request. Each line has to have a valid JSON object with 'prompt' and optional 'images' keys. For example:
83
+
```
84
+
{"prompt": "<image>What color is the cat?", images: ["_64_DATA]}
85
+
{"prompt": "<image>What color is the dog?", images: ["_64_DATA]}
86
+
```
87
+
79
88
## Examples
80
89
81
90
Maintain fixed 8 requests concurrency against local deployment:
0 commit comments