You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: exams/ita_2025/report.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,7 +40,7 @@ For multiple-choice exams, the score is straightforward. For essays, each questi
40
40
41
41
For almost of all questions, `max_tokens_question_description=400` and `max_tokens_question_answer=5000` used in `gpt-resolve` was enough, but for some questions (4,5,7,8) it was necessary to run them again since the answer on the first try were not complete due to the limit of max completion tokens processed. In these cases, `max_tokens_question_answer=10000` was used.
42
42
43
-
The `o1` model, in the other hand, got question 10 correctly with analytic geometry, but in question 7 it failed partially, achieving 95% at the exam. We expect that `o3` and `o4-mini` may achieve 100%. An interesting thing is that this model is much more objetive than the previous version.
43
+
The `o1` model, in the other hand, got question 10 correctly with analytic geometry, but in question 7 it failed partially, achieving 95% at the exam. An interesting thing is that this model is much more objetive than the previous version. We expect that `o3` and `o4-mini` may achieve 100%.
0 commit comments