posasummit.blogg.se

Download the imitation game test
Download the imitation game test










Task examples can be specified in one of two ways:Īs a single input/target-output pair, where the target output will be compared against the model output.įor example: Which metric from metrics to prefer when reporting the task evaluation results.Īdd your task's input/output examples here! See here for a description of available metrics. The metrics to be used in evaluation, such as "exact_str_match". In this example we use "My awesome task".Ī short human-readable description for your task. Second, edit task.json and update the following fields. ⚠️įirst, edit README.md to include the correct author and task information. This is to prevent benchmark tasks from leaking into web-scraped training data. ⚠️ All task files (including README.md and task.json) contain a "canary" string, which should not be edited. We will now edit both files with the new task information. The task consists of two files: README.md describes the task, and task.json contains the task data. As described in the previous section, simple_arithmetic_json is a good choice.

download the imitation game test download the imitation game test

We recommend starting with one of the JSON template tasks. To add new model results to the full BIG-bench leaderboard, to the BBL leaderboard, and to individual task performance plots, open a PR which includes the score files generated when you evaluate your model on BIG-bench tasks.Ĭp -r simple_arithmetic_json my_awesome_json_taskĬd my_awesome_json_task Creating a JSON task It is designed to provide a canonical measure of model performance, while being farĬheaper to evaluate than the full set of more than 200 programmatic and JSON tasks in BIG-bench.Ī leaderboard of current model performance on BBL is shown below. BIG-bench Lite leaderboardīIG-bench Lite (BBL) is a small subset of 24 diverse JSON tasks from BIG-bench. The benchmark organizers can be contacted at of contentsįor more details about the benchmark, see our detailed instructions. However, they will be included in future BIG-bench releases, and the task authors will be included in the author list of future publications. New tasks are no longer eligible for inclusion in the initial BIG-bench release and paper.

download the imitation game test

Tasks will be reviewed and merged into the BIG-bench repository on a rolling basis. A paper introducing the benchmark, including evaluation results on large language models, is currently under review, and is available as a preprint.

download the imitation game test

The more than 200 tasks included in BIG-bench are summarized by keyword here, and by task name here. The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborativeīenchmark intended to probe large language models and extrapolate their future












Download the imitation game test