Received date: August 25, 2015; Accepted date: September 10, 2015; Published date: September 14, 2015
Citation: Jing R, Rong Li, Xuemei Pu, et al. A Web-based Graphic User Interface of PML for Machine Learning in Parallel Running Head: A Web-based GUI of PML for Machine Learning. Chem Inform. 2015, 1:2.
Machine learning; GUI; Graphic user interface; Distributed computing; Methods comparison; Data mining
With the rapid development of analytic technique, machine learning methods are widely used in several fields, such as Cheminformatics and Bioinformatics, for digging useful information from the experimental data. In most cases, the distribution of data from different fields would be variant due to the devices and samples of the experiments. Therefore, to deal with different data, appropriate methods are needed. For example, the data from RNA sequencing would have large number of genes, which would be far more than the number of samples, thus the methods from graphic theory and statistics are necessary to reduce the scale of genes [1-3]. Moreover, if the distribution of the data is not simple, the traditional linear methods usually could not model the data very well [4,5]. In this case, the nonlinear methods, such as kernel trick and sparse factor, are necessary for improving the performance of modeling and predicting [6,7]. Many well developed machine learning tools have been released; however, the tools are hard to integrate due to the distinct running environments. For instance, if we want to use several methods to model a dataset and use cross validation or leave-one-out for the training dataset, the rework of dataset slicing is usually inevitable. Therefore, a tool which could integrate and compare several methods from different environment is necessary.
Based on the motivation, we developed PML, a software which could integrate the methods and running in parallel . Further, to make the server version of PML more user-friendly, we developed the web-based GUI. With this GUI, users could generate and control PML tasks more effectively. In addition, users could set one or more computer as a computing source in a local area network (LAN). The code could be downloaded at http://cic.scu.edu.cn/pml.
PML for task processing
We developed PML for processing machine learning tasks in parallel. PML can process dimension reduction, grid search, cross validation and result analysis in parallel. Moreover, more than a single machine, PML could use multiple machines as a cluster to process the tasks by combining BOINC. The output is in HTML format and could be view in browser. Moreover, the results are put in multiple independent folders so that users can move it easily. The mechanisms of fault-tolerant and interrupt recovery are achieved to confirm the stability of the execution of PML. The methods of WEKA and Waffles have been combined into PML, and users could combine new methods into PML through the provided command API. The intermediate data and scripts are archived for repeat, exam or other use. Through PML, users and researchers could modeling data and find the best model more effectively.
The web-based GUI
The input of PML is a script and is submitted by command line. Therefore, the operation would be complicated if users submit a task to a server by SSH. Considering that most of the situations that using PML for large amount of calculation are on a server with several CPU cores or on a cluster with multiple machines, we developed the GUI to simplify the operation of task submission and controlling.
With this GUI, users could create a PML task including data file uploading, method selection, grid search setting and process controlling. Considering that when choosing a method for the first time, users would want to know the details of the method and the related parameters, we provided a floating window to show the brief explanations of the methods, parameters and options. When the task submitted, users could control the process of a task, including stop, continue and delete (Figure 1). After the complement of a task, a link would be provided to view the results. Besides, users could generate a script without submission, and the script could be copied to anywhere.
PML provided the mechanism of grid search, but the input format is not so easy to write. Therefore, we provided some functions to simplify the setting of grid search. By using the GUI, users could modify and view the changed parameters in real time. Additionally, we also provided a brief explanation of the input format in the floating window (Figure 2).
With the rapid development of analytic technique, the experiment data became increasing complicated. In order to dig the useful information from the data, multiple statistical analysis and machine learning methods become necessary. To improve the efficient of the using and comparison of the methods, we provided PML. Further, to simplify the operation of submission and controlling of the tasks, we developed the GUI. The GUI simplifies the operation of task submission, and provides links to the generated results. We hope that this GUI could save time cost in data modeling and methods comparison, so that researchers could be more efficient in their research.
We thank the anonymous reviewers for their patient review and constructive suggestions. This work was supported by the National Natural Science Foundations of China (21375090 and U1230121).
All Published work is licensed under a Creative Commons Attribution 4.0 International License
Copyright © 2019 All rights reserved. iMedPub LTD Last revised : April 21, 2019