GitHub - LiangThree/ABSEval: Code of the paper ABSEval: An Agent-based Framework for Script Evaluation

脚本评测框架

功能模块：

主要的功能模块有四部分： 1、inference：运行各个模型进行关于script question的推理； 2、learner：使用我们的方法学习全部的脚本Chans标准答案； 3. eval：根据标准答案对每一个脚本进行评价 3、execute：执行关于每一个步骤的评价 4、commonsense：常识检查；

配置文件：

模型配置文件：模型配置文件在config/model_config.yaml中定义；如下所示是一个模型的配置信息；
```
baichuan-inc:
  Baichuan-7B:
    model_type: base
    path: /home/zbl/data/llm/baichuan-inc/Baichuan-7B
```
运行配置：运行配置文件在config/目录中中定义；配置文件是json格式，是一个list，每一个list包含一次数据加载到评估的整个过程，也就是说一次程序运行可以执行多次上述过程；

[
  {
    "db_path": "data/database/script.db",
    "model_conf_path": "config/model_config_docker.yaml",
    "target_view": [
      "1"
    ],
    "others": [
      "lmsys/vicuna-7b-v1.5",
	  "lmsys/vicuna-13b-v1.5",
	  "WizardLM/WizardLM-13B-V1.2",
	  "meta/llama2-7b-chat",
	  "meta/llama2-13b-chat",
	  "meta/Llama-2-70b-chat",
	  "qwen/Qwen-7B-Chat",
	  "qwen/Qwen-14B-Chat",
	  "qwen/Qwen-72B-Chat",
	  "01ai/Yi-6B-Chat",
	  "01ai/Yi-34B-Chat",
      "baichuan-inc/Baichuan2-7B-Chat",
      "baichuan-inc/Baichuan2-13B-Chat",
      "baichuan-inc/Baichuan-13B-Chat"

    ],
    "inference_model_repo_id": [
      "THUDM/chatglm3-6b",
	  "mistralai/Mistral-7B-Instruct-v0.2",
	  "mistralai/Mistral-7B-Instruct-v0.1"
    ],
    "metric_conf": {
      "metric_name": "model_metric",
      "model_repo_id": [
	    "openai/chatgpt"
      ],
      "acceleration_method": "vllm",
      "eval_prompt_format": "path:data/metrics/eval_prompt.txt"
    }
  }
]

如上述配置所示，选择题需要修改的配置有：

db_path: 数据库路径；
target_view: 问题难度；
model_repo_id: 模型配置文件中的模型名称；

运行命令

运行模型推理 python scripts/run_inference.py --run-specs config/run_specs.json

运行模型学习 python scripts/run_learner.py --run-specs config/run_learner_specs.json --num-instances 5

运行模型评价 python scripts/run_eval.py --run-specs config/run_eval_specs.json --num-instance 5

运行执行者 python scripts/run_execute.py --run-specs config/run_execute_specs.json

运行常识检查 python scripts/run_commonsense.py

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
data		data
gold_answer		gold_answer
human_align		human_align
llm_eval		llm_eval
model_rank		model_rank
scripts		scripts
test		test
utils		utils
vera_model		vera_model
README.md		README.md
__init__.py		__init__.py
command		command
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

脚本评测框架

功能模块：

配置文件：

运行命令

About

Uh oh!

Releases

Packages

Uh oh!

LiangThree/ABSEval

Folders and files

Latest commit

History

Repository files navigation

脚本评测框架

功能模块：

配置文件：

运行命令

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages