Abstract
The objectives of this study involved (a) the application of methods of knowledge discovery from
database using decision tree algorithms for respiratory system diagnosis to classify patients of the Pranakorn
Sri Ayudthaya Hospital into three groups: acute upper respiratory tract infection, acute sinusitis, and
pneumonia, and (b) the comparison of performance of the three decision tree algorithms, i.e., ID3, C4.5,
and CART, for the classification or screening of the patients with the three diseases. The data used in this
study came from the medical records of 7,327 out-patients with respiratory diseases who attended Pranakorn
Sri Ayudthaya Hospital in the period from 2003 to 2006. The variables considered were age, body temperature,
residential area, occupation, and certain symptoms, e.g., rhinorrhea, fever, nasal congestion,
periorbital pain, headache, wheezing and coughing. The study methods were knowledge discovery with
the employment of ID3, C4.5, and CART decision tree algorithms from the hospital’s medical records and
determination of the effectiveness of the three algorithms. The validity of the decision tree algorithms was
studied by dividing the data into two sets: training and testing data sets, which were based on the crossvalidation
and the percentage split methods.
The results of the knowledge discovery method found that, for the patients with acute URI with only
seven selected variables and a ratio 70:30 of the training data set and the testing data set, the C4.5 algorithm
was the most effective, with a classification accuracy of 92.31 per cent. For the classification of the
patients with acute sinusitis with only eight selected variables and ratio 70:30 of the training data set and
the testing data set, the C4.5 algorithm was the most effective, with a classification accuracy of 94.70 per
cent. For the classification of the patients with pneumonia with only seven selected variables and ratio
50:50 of the training data set and the testing data set, the CART algorithm was the most effective, with a
classification accuracy of 94.69 per cent. The results obtained could be used to support the diagnosis of
patients with respiratory diseases.