Welcome to the IneqMath Dev Set Evaluation Platform!
Please see the official leaderboard page for more submission details.
๐ Project |
๐ Leaderboard |
arxiv |
๐ค HF Paper |
Code |
๐ค Dataset |
๐ฎ Visualization
Submit New Model Evaluation Results for Dev Set
Please upload the JSON file with model evaluation results for the dev set and fill in the following information. If you have any questions, please contact us at jiayi_sheng@berkeley.edu or lupantech@gmail.com.
โข You can revoke or deactivate your key 15 minutes after evaluation completion. The evaluation process typically costs around $25 depending on your submission size.
โข If no API key is provided, your submission will be processed with our default API key.
Select the type of your model
Select whether the model is proprietary or open-source
Optional: Select the reasoning effort level
Required JSON Structure:
Your JSON file must include at least these 5 fields for each problem:
[
{
"data_id": [integer or string] The ID of the test data,
"problem": [string] The question text,
"type": [string] The type of question: 'relation' or 'bound',
"prompt": [string] The prompt used for the problem,
"response": [string] The response of the model
},
...
]
You can click the download button below to get an example file. The system will process your submission and calculate accuracy metrics automatically.
Process Query
Enter your email address below to retrieve your evaluation scores.
ID
|
Status
|
Model
|
Size
|
Type
|
Source
|
Date
|
Submission time
|
Overall Acc
|
Answer Acc
|
Step Acc
(NTC)
|
Step Acc
(NLG)
|
Step Acc
(NAE)
|
Step Acc
(NCE)
|
---|
Status Explanation:
- Processing: Your submission is currently being evaluated by us. This may take several minutes to complete.
- Completed: Evaluation is finished and results are ready.