Open
Description
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
High
Please provide a clear description of problem this feature solves
Results of each evaluator are currently presented as json file. Provide a configuration that also plots the output.
- The plot should work across multiple evaluators.
- Multiple reps
- Should show mean and variance
- It should highlight entries in the dataset that show the most variation
Additionally (where possible), accuracy should be plotted vs. tokens vs. latency. Currently, the evaluation and profiling results are available in separate output files. This change should combine the results.
Describe your ideal solution
The script should be customization and off by default
Additional context
No response
Code of Conduct
- I agree to follow this project's Code of Conduct
- I have searched the open feature requests and have found no duplicates for this feature request