The Comparative Study of Mining Data Classification Algorithm for Cost Group Determination of Student’s Single Tuition Fee (STF)

Abstract—Group determination of student’s Single Tuition Fee (STF) by universities is a task analysis of students’ financial ability to determine the group of tuition fees amount by students; thereby it must be conducted carefully. The importance of this determination task of student’s STF amount makes the process requires a lot of time, effort and cost if conducted manually, especially if student’s data analyzed reach thousands of data. To overcome this problem, the use of mining data classification algorithm in this research is explored to find the best algorithm for the case of STF group classification. Some criteria used as a feature to classify group of STF in this research such as the parents’ income, the number of dependents, the regional origin and selected study program cluster. Utilizing machine learning techniques, the results obtained showed that Decision Tree and SVM as algorithms with the highest accuracy of 80%. The determination of the best algorithm between both of them then conducted by applying the rules of fault tolerance. The best-obtained algorithm was finally used to predict the STF group class of each student that amount to 3528 data.

Keywords—classification algorithm; machine learning; mining data; STF   

I. INTRODUCTION

The government of Indonesia states that the economic conditions diversity of the communities varies widely, thereby the policy-making in relation to the expenditure of the community for a service should pay attention to the justice aspect by classifying the financial capability of each citizen to adjust the cost amount. This matter underlies the birth of Cross Subsidy policy, not least in the world of education with the application of Single Tuition Fee (STF) which has been regulated in Permenristekdikti Number 22 Year 2015 [1].

In the case of the STF amount group determination, each university should analyze new student data to classify its financial capabilities. The main criteria used, As the part that has been set on [1], it is the parent income criteria. However, the financial ability of a student cannot be seen only from the income amount of his parents alone. Other influential criteria such as the number of siblings, regional origin, and selected study cluster also have an impact on a student’s financial ability.

The number of criteria and the amount of student data that reaches thousands to be analyzed certainly takes a lot of time, effort and cost [2]. It is required a system that can assist in decision making about the STF amount determination of each student with analysis of his financial ability.

In this research, the best mining data classification algorithm to determine the student’s STF group is explored by utilizing Machine Learning. Classification algorithms such as Decision Tree, Naïve Bayes, and Support Vector Machine as classifiers are discussed in order to find the best algorithm in terms of accuracy for the case of student’s STF group determination.

II. LITERATURE REVIEW

Single Tuition Fee or commonly referred to as STF is a form of Indonesian government policy to guarantee the right of every citizen to receive a proper education. STF is designed as an educational financing solution at the university level by applying the concept of cross-subsidy. For fees in STF as referred to Article 2 paragraph (2) Permenristekdikti Number 22 Year 2015 [1] consists of several groups determined based on the economic ability of students, students’ parents, or other parties who finance it.

The nature of STF determination works by analyzing a student’s financial ability based on characters that can affect his financial condition. To overcome this problem, mining data techniques are commonly technique used [3]. Mining data is a series of processes for extracting previously unknown new information from a data set [4]. Mining data is mostly conducted to the purposes of classification, prediction, and clustering. For the case of STF determination itself fell on the classification analysis.

Several researches have shown that financial ability analysis could be conducted with mining data for various financial purposes [4] such as Student Determination with Difficult Economic Conditions [3], Prediction of Scholarship Provision Determination [5], up to Classification for Financial Fraud Detection [6].

Tabel 1. NEW STUDENTS’ REGISTRATION DATABASE OF UJUNG PANDANG STATE POLITECHNIC

No. Participant No. City of Origin Father’s Income Mother’s Income Dependent Cluster
1 10082 Pangkajene Kepulauan Regency Rp. 1.000.000 – Rp. 2.000.000 No Income 4 Engineering
2 10031 Pangkajene Kepulauan Regency Rp. 2.000.000 – Rp. 3.000.000 No Income 3 Engineering
3 20063 Polewali Regency Rp. 2.000.000 – Rp. 3.000.000 No Income 2 Commerce
3528 11849 Makasar City Rp. 2.000.000 – Rp. 3.000.000 Rp. 1.000.000 – Rp. 2.000.000 5 Engineering

Research [3] conducted an analysis to find and determine students with difficult economic conditions based on the condition of their family’s ability in paying tuition fees. This research is conducted because the economic condition that burden the students can negatively affect mental health, academic performance, students’ personal, to the social life [3]. Finally, this research explores mining data with multi-label learning problem to student’s habits on campus from various perspectives through activity with student card, internet usage and places they visited in campus environment. This research compared several methods such as Support Vector Machine (SVM) and Multiple Kernel Learning (MKL) with self-developed method that is Dis-Hard. The result of the proposed method of research [3] got better performance results from existing methods. This research showed the economic ability of a student could be extracted with mining data.

Mining data and machine learning as a technique for determining the eligibility of students in receiving educational scholarships are conducted on research [5]. This research takes some characters of students such as moral, intellectual, to health as a variable to determine the scholarship. With the Naïve Bayes (NB) method, research [5] succeeded in finding the cause of a student deserving or not getting a scholarship. This reserach showed the potential of Naiver Bayes could be used in the case of STF.

The use of classification techniques in mining data for financial needs is also indicated by research [6]. The research aimed at becoming a detector in financial fraud utilizes three different methods, namely SVM, Naïve Bayes and K-Nearest Neighbors (KNN). Research [6] has finally found that SVM provided superior results compared to NB and KNN.

Research [7] showed the extracting of information on mining data can also be conducted using Decision Tree method. The research that aims to classify this non-performed credit utilizes decision tree as the basic method of approaching the classification of non-performed credit. The analysis performed with this market financial object found that the classification approach based on decision tree resulted with high accuracy, which could provide an explanation of the factors that affecting each classification.

Based on the literatures on the ability of classification methods in mining data in the field of finance and education mentioned above, this research ultimately uses mining data with the Decision Tree classification algorithm, Naïve Bayes and SVM as a method to classify group of student’s STF group.

III.  RESEARCH METHODOLOGY

     There are three major stages in this research, which are data collecting from Ujung Pandang State Polytechnic and Permenristekdikti Number 22 Year 2015 and the determination of criteria variable as feature, the normalization and data simplification, and the process of making and testing of classification algorithm with machine learning. These stages are described as follows:

A. Data and Research Variables

The data source used in this case is data from new student enrollment database of Ujung Pandang State Polytechnic (UPSP) in 2014. In this database, student’s data ranging from general biodata, parents, to the level choice and study program required for the STF group determination that has been available with the amount of data reaching 3528 data as shown in Table 1.

The group data of STF is obtained from the attachment of Permenristekdikti Number 22 Year 2015 for Ujung Pandang State Polytechnic (Table 2).

There are five variable classification criteria used in this research, namely Father’s Income, Mother’s Income, Total Dependent, Regional Origin and Selected Study Program Cluster. These five variables are initialized as classification criteria because of their influence on financial capability.

Tabel II. GROUP DATA OF STF ACCORDING TO GOVERNMENT RULES

Single Tuition Fee (Per Semester) UPSP

Group 1 (STF1) Group 2 (STF2) Group 3 (STF3) Group 4 (STF4) Group 5 (STF5)
500.000 1.000.000 1.750.000 3.000.000 4.000.000

B. Simplification and Normalization of Data

The simplification is conducted to each of the influential field to make it easier to process. The purpose of normalization in this reserach is to convert the string data into an integer form. Some rules of simplification and normalization are conducted as follows:

  • Father’s_Income and Mother’s_Income Field

Student data obtained and will be used in this research still in the form of raw data, where the parent income field (father and mother) consists of string-shaped data, which does not match the feature format to be processed in machine learning. Thereby, the form of this string data is converted to an integer (shown in Table 3).

Tabel III. TABLE RULE NORMALIZATION INCOME

Income Variations Simplication Form
No Income 0
Rp. 1.000.000 – Rp. 2.000.000 1
Rp. 2.000.000 – Rp. 3.000.000 2
Rp. 3.000.000 – Rp. 4.000.000 3
Rp. 4.000.000 – Rp. 5.000.000 4
> Rp. 5.000.000 5
  • Origin_City Field

For the city of origin, normalization is conducted with the assumption that if the student origin is not from Makassar then it is considered 1, and if from the city of Makassar is considered 0. Described by the following table:

Tabel IV. TABLE OF ORIGINAL REGION NORMALIZATION RULE

City of Origin Normalization Form
Makassar City 0
Not Makassar City 1
  • Selected_Cluster Field

Cluster field contains the ‘engineering’ string converted to ‘2’ integer or ‘commerce’ into ‘1’ integer, the normalization rule for this field is shown by Table 5.

Tabel V. TABLE OF SELECTION STUDY PROGRAM CLUSTER NORMALIZATION

Cluster Normalization Form
Commerce 1
Engineering 2

The final result of simplification and normalization of data from data that originally contained string data on father and mother income fields, city of origin and cluster as shown in Table 1 becomes simple with the same type of integer as shown by Table 6 below:

Tabel VI. THE FINAL RESULT OF DATA SIMPLICATION

No. Participant No. City of Origin Father’s Income Mother’s Income Dependent Cluster
1 10082 1 1 0 4 2
2 10031 1 2 0 3 2
3 20063 1 2 0 2 1
3528 11849 0 2 1 5 2

     Furthermore, the data of origin city, the father’s income, the mother’s income, the number of family’s dependents and the cluster of study program selection will be made into one feature to determine the group class of STF’s cost for each student.

C. Training and Test of Machine Learning Classification Algorithms

Before it can be used to classify student’s data, some algorithms are trained and tested to find the best algorithm for STF’s case. Some of the algorithms tested in this research are Decision Tree, BayessianNB, MultinomialNB, BernoulliNB, and SVM. Figure 1 illustrates the classification process proposed from the training phase until the prediction.

Ingin dibuatkan seperti ini?
Butuh versi lengkap?
Atau ada tugas costum lainnya?
Silahkan hubungi geraijasa.com di no wa 0821-3805-4433

Tinggalkan Balasan

Alamat email Anda tidak akan dipublikasikan. Ruas yang wajib ditandai *