I came across the collaborative platform of openml.org and the R package <mlr>, which provides a collaborative platform for data mining and machine learning. It implements a couple of user interfaces (Website, API for R, Python, Java, and plugings for KNIME, MOA, WEKA, RapidMiner) for easy, seemless use. It has the potential to be a paradigm changer for how to collaborate on resarch projects and share results in a fully reproducible way. Unfortunately,
the website is still a beta version. It seems the vision of the developers is at present, greater than their budget. Not sure whether their approach will be widely adapted, because they might be too many steps ahead of the current mainstream research (article-based, journal based academic reputation system). However, a lot of things are changing currently into direction of openly sharing data (BMJ requires data sharing) which might make way for this collaborative paradigm. It certainly would deserve it. Anyway, I recommend trying out the R/mlr package which implements the same concepts (Data, Task, Flow, Run).
- Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. OpenML: networked science in machine learning. SIGKDD Explorations 15(2), pp 49-60, 2013. [http://arxiv.org/pdf/1407.7722v2.pdf]
- OpenML Guide
- OpenML Wiki
- R/mlr tutorial, Retrieved 2015-08-23, from http://mlr-org.github.io/mlr-tutorial/release/html/.
- Bischl, B. (2015). OpenML and R and mlr. Oral presentation at Statistical Computing 2015. Workshop by the International Biometric Society – German Region. Retrieved 2015-0819, from https://www.youtube.com/watch?v=rzjkT1uLNi4 [Video]
- Vanschoren, J. (2014). OpenML – Networked science in machine learning.
Retrieved 2015-08-19, from https://www.youtube.com/watch?v=J84Eg-0RlCk [Video] and http://www.slideshare.net/JoaquinVanschoren/openml-towards-networked-and-automated-machine-learning [Slides]