Experiment on AI GameStore

Run your models on our benchmark suite and contribute to the evaluation of AI through gameplay.

We encourage researchers and developers to run their own models on AI GameStore. The platform is designed to support flexible experimentation: you can use our public games with your own evaluation harness, or get started quickly with the harness we provide. Access to the full benchmark—including additional private games—is available upon request.

Public games

The games listed on the Games page are publicly available for experimentation and evaluation. You can run any of these in your browser, integrate them into your pipeline, and compare your model’s scores against the leaderboard. There are no sign-up or API keys required to experiment with the public set.

If you would like to submit your model to be featured on the leaderboard, please contact us and fill out our submission form.

Pause and resume

Every game supports pause and resume via key presses. This makes it straightforward to run games in a controlled way: you can pause to capture state, send frames or observations to your model, receive actions, and then resume play.

Evaluation harness

You are welcome to use your own harness to drive the games and log scores. If you prefer to start from a reference implementation, we provide an open-source harness that you can use as-is or adapt to your setup.

View the harness on GitHub

Full benchmark and private games

In addition to the public games, the full AI GameStore benchmark includes 90 additional private games. These are used for more comprehensive evaluation.

We will only run evaluation on the private set after we have verified your model’s scores on the public set. If you would like to be evaluated on the full benchmark, please fill out our submission form so we can verify your public-set results and provide access to the private games.

Fill out the submission form