OpenML is a useful online platform that aims to improve Open Machine Learning. It stands for Open Data, Open Algorithms and Open Research. OpenML is still in the beta phase, but is already working quite well.

d occlusion e

With this blog post we would like to introduce the most important concepts. On this basis you can make a decision about whether this platform might be interesting for you. We will also look at future challenges.


The following four concepts form the basis of the platform:

  • Data
  • Task
  • Flow
  • Run

Who can use OpenML?

The domain scientist.

Do you have data that you cannot analyze perfectly?

Then we recommend you to upload your data in OpenML and get comprehensive support. Write a good data and task description to make sure people understand the problem.

The data analyst.

Do you enjoy taking on new challenges? Then get involved with OpenML and solve demanding tasks.

The algorithm developer.

You have developed a statistical method or a machine learning algorithm and want to try it out? You will find many data sets and the possibility to make your algorithm public.

The student.

You study statistics, computer science or machine learning? You want to know what is going on out there? On OpenML you will find a lot of algorithms and information about software and implementation.

The teacher.

You are teaching a machine learning class and wants students to participate in a challenge? Put together your own assignment and let your students get to the task. The platform shows who uploaded what and when.

The unknown.

There may be many other people who will benefit from the platform, such as meta-analysis, benchmarkers and people we are not thinking about at the moment.

How to use OpenML.

Apart from just browsing the website, you can access OpenML through a whole range of interfaces like R or WEKA.

The whole project is of course Open Source. Take a look at the various git repositories for all the code.

The overfitting problem.

Platforms like Kaggle or Crowdanalytics give people only part of the data, so they can evaluate the algorithm`s performance on a separate data set to solve the overfitting problem. So far, OpenML does not do this. All data is always displayed, and algorithms are evaluated using resampling techniques (called estimation techniques in OpenML). There is much discussion about how to solve the problem of overfitting on OpenML. They range from the initial concealment of part of the data for a period of time to the repeated cross-validation of well-functioning procedures for a given task. If you have any ideas, don`t hesitate to share them with us. As a member of the Khronos Explanatory Group, we will gladly forward your request.

Thank you very much for your visit.