Prediction of Student Final Exam Performance in an Introductory Programming Course: Development and Validation of the Use of a Support Vector Machine-Regression Model


  • Ashok Kumar Veerasamy University of Turku
  • Daryl D'Souza RMIT University
  • Rolf Lindén Researcher, University of Turku
  • Mikko-Jussi Laakso University of Turku



Prior programming knowledge, At-risk students, Predictive data mining models, machine learning approach


This paper presents a Support Vector Machine predictive model to determine if prior programming knowledge and completion of in-class and take home formative assessment tasks might be suitable predictors of examination performance. Student data from the academic years 2012 - 2016 for an introductory programming course was captured via ViLLE e-learning tool for analysis. The results revealed that student prior programming knowledge and assessment scores captured in a predictive model, is a good fit of the data. However, while overall success of the model is significant, predictions on identifying at-risk students is neither high nor low and that persuaded us to include two more research questions. However, our preliminary post analysis on these test results show that on average students who secured less than 70% in formative assessment scores with little or basic prior programming knowledge in programming may fail in the final programming exam and increase the prediction accuracy in identifying at-risk students from 46% to nearly 63%. Hence, these results provide immediate information for programming course instructors and students to enhance teaching and learning process.


Author Biographies

Ashok Kumar Veerasamy, University of Turku

Researcher, Department of Future Technologies


Daryl D'Souza, RMIT University

Senior Lecturer, School of Computer Science and Information Technology

Rolf Lindén, Researcher, University of Turku

Researcher, Department of Future Technologies

Mikko-Jussi Laakso, University of Turku

Adjunct Professor, Department of Future Technologies


Abdi, H., 2007. Binomial distribution: binomial and sign tests. s.l.:Encyclopedia of Measurement and Statistics.

Abu-Oda, G. S. & El-Halees, A. M., 2015. Data Mining in Higher Education: University student dropout case study. International Journal of Data Mining & Knowledge Management process, 5(1), pp. 15-27.

Alexandron, G., Armoni, M., Gordon, M. & Harel, D., 2012. The effect of Previous Programming Experience on the Learning of Scenario-Based Programming. s.l., ACM, pp. 151-159.

Ali, A. & Smith, D., 2014. Teaching an Introductory Programming Language. Journal of Information Technology Education: Innovations in Practice, Volume 13, pp. 57-67.

Asif, R., Merceron, A. & Pathan, M. K., 2015. Predicting Student Academic Performance at Degree Level: A Case Study. International Journal of Intelligent Systems and Applications, 7(1), pp. 49-61.

Astin, A. W., 1978. Four Critical Years. Effects of College on Beliefs, Attitudes, and Knowledge. s.l.:ERIC.

Ausubel, D. P., Novak, J. D. & Hanesian, H., 1978. Educational Psychology: A cognitive view. New York: Rinehart and Winston.

Bergin, S., Mooney, A., Ghent, J. & Quille, K., 2015. Using Machine Learning Techniques to Predict Introductory Programming Performance. International Journal of Computer Science and Software Engineering, December, 4(12), pp. 323-328.

Bergin, S. & Reilly, R., 2005. Programming: factors that influence success. s.l., s.n.

Borra, S. & Ciaccio, A. D., 2010. Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods. Computational Statistics and Data Analysis, Volume 54, pp. 2976-2989.

Byrne, P. & Lyons, G., 2001. The effect of student attributes on success in programming. Canterbury, UK, ACM, pp. 49-52.

Conjin, R., Snijders, C. & Kleingeld, A., 2017. Predicting Student Performance from LMS Data: A Comparison of 17 Blended Courses Using Moodle LMS. IEEE Transactions on Learning Technologies , January_March, 10(1), pp. 17-29.

Corbett, A. T. & Anderson, J. R., 2001. Locus of feedback control in computer-based tutoring: Impact on learning rate, achievement and attitudes. New York, ACM, pp. 245-252.

Costa, E. B., Fonseca, B., Santana, M. A. & de, F. F., 2017. Evaluating the effectiveness of educational data mining techniques for early prediction of students' academic failure in introductory programming courses. Computers in Human Behavior, Volume 73, p. 247–256.

de-la-Fuente-Valentín, L., Pardo, A. & Kloos, C. D., 2013. Addressing drop-out and sustained effort issues with large practical groups using an automated delivery and assessment system. Computers & Education, 61(February), pp. 33-42.

Derksen, S. & Keselman, H., 1992. Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. British Journal of Mathematical and Statistical Psychology, 45(2), pp. 265-282.

Devasia, T., P, V. T. & Hegde, V., 2016. Prediction of students performance using Educational Data Mining. Ernakulam, IEEE, pp. 91-95.

Dickey, D. A., U., N. C. S. & Raleigh, N., 2012. Introduction to Predictive Modeling with Examples. s.l.:s.n.

EIGamal, A., 2013. An Educational Data Mining Model for Predicting Student Performance in Programming Course. International Journal of Computer Applications, May, 70(17), pp. 22-28.

Evans, G. E. & Simkin, M. G., 1989. What Best Predicts Computer Proficiency?. Communications of the ACM, 32(11), pp. 1322-1327.

Fortmann-Roe, S., 2012. Accurately measuring model prediction error. s.l.:s.n.

Grover, S., Pea, R. & Cooper, S., 2016. Factors Influencing Computer Science Learning in Middle School. Memphis, TN, USA, ACM, pp. 552-557.

Guo, B. et al., 2015. Predicting Students Performance in Educational Data Mining. Wuhan, China, s.n.

Hailikari, T., 2009. Assessing university students' prior knowledge implications for theory and practice, Helsinki: Helsinki University Print, Finland.

Holden, E. & Weeden, E., 2003. The impact of prior experience in an information technology programming course sequence. Lafayette, Indiana, ACM, pp. 41-46.

Hsu, W. C. & Plunkett, S. W., 2016. Attendance and Grades in Learning Programming Classes. Canberra, s.n.

Huang, S., 2011. Predictive Modeling and analysis of Student Academic Performance in an Engineering Dynamics Course, Logan, Utah: Utah State University.

Huang, S. & Fang, N., 2013. Predicting student academic performance in an engineering dynamics course: A comparison of four types of predictive mathematical models. Computers & Education, 61(1), p. 133–145.

Hämäläinen, W. & Vinni, M., 2006. Comparison of Machine Learning Methods for Intelligent Tutoring Systems. Jhongli, Taiwan, Springer, pp. 525-534.

Jacoby, J. & Matell, M. S., 1971. Three-point Likert Scales Are Good Enough. Journal of Marketing Research, 8(4), pp. 495-500.

Kattan, M. W., 2011. Factors affecting the Accuracy of Prediction Models Limit the Comparison of Rival Prediction Models When Applied to Separate Data Sets. European Urology, 59(4), pp. 566-567.

Kebritchi, M., Hirumi, A. & Bai, H., 2010. The effects of modern mathematics computer games on mathematics achievement and class motivation. Computers & Education, 55(2), pp. 427-443.

Kim, J.-H., 2009. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics and Data Analysis, Volume 53, pp. 3735-3745.

Kinnunen, P. & McCartney, R., 2007. Through the eyes of instructors: a phenomenographic investigation of student success. Atlanta, ACM, pp. 61-72.

Koulouri, T., Lauria, S. & Macredie, R. D., 2014. Teaching Introductory Programming: A Quantitative Evaluation of Different Approaches. ACM Transactions on Computing Education (TOCE), 14(4), pp. 26.1-26.27.

Krumm, A. E., Waddington, R. J., Teasley, S. D. & Lonn, S., 2014. A learning Management System-Based Early Warning System for Academic Advising in Undergraduate Engineering. In: J. A. Larusson & B. White, eds. Learning Analytics: From Research to Practice. New York: Springer, pp. 103-119.

Lee, Y.-J., 2016. Predicting Students’ Problem Solving Performance using Support Vector Machine. Journal of Data Science, 14(2), pp. 231-244.

Lin, T.-F. & Chen, J., 2006. Cumulative class attendance and exam performance. Applied Economics Letters, 13(14), pp. 937-942.

Longi, K., 2016. Exploring factors that affect performance on introductory programming courses, Helsinki: s.n.

Lye, S. Y. & Koh, J. H. L., 2014. Review on teaching and learning of computational thinking through programming: What is next for K-12?. Computers in Human Behavior, Volume 41, pp. 51-6141.

Marbouti, F., Diefes-Dux, H. A. & Madhavan, K., 2016. Models for early prediction of at-risk students in a course using standards-based grading. Computers & Education, Volume 103, pp. 1-15.

Marios Tacio Silva, E. d. B. C. E. T. S. P. H. B. J. C., 2014. Failure rates in introductory programming: A 2006–2012 study at a Brazilian University. Madrid, Spain, IEEE, pp. 1-7.

O.Ogundimu, E., G.Altman, D. & S.Collins, G., 2016. Adequate sample size for developing prediction models is not simply related to events per variable. Journal of Clinical Epidemiology, Volume 76, pp. 175-182.

Pardo, A., 2014. Designing Learning Analytics Experiences. In: J. Larusson & B. White, eds. Learning Analytics From Research to Practice. New York: Springer, pp. 15-35.

Peduzi, P. et al., 1996. A Simulation Study of the Number of Events per Variable in Logistic Regression Analysis. Journal of Clinical Epidemiology, 49(12), pp. 1373-1379.

Rajala, T. & Erkki Kaila, M.-J. L., 2005. ViLLE. [Online] Available at: [Accessed 20 10 2015].

Romero, C., López, M.-I., Luna, J.-M. & Ventura, S., 2013. Predicting students' final performance from participation in on-line discussion forums. Computers & Education, Volume 68, p. 458–472.

Rosenschein, J. S., Vilner, T. & Zur, E., 2004. Work in progress: programming knowledge - does it affect success in the course introduction to computer science using Java. Savannah, GA, IEEE Xplore, pp. 3-4.

Seery, M. K., 2009. The Effect of Prior Knowledge in Undergraduate Performance in Chemistry: A Correlation–Prediction Study, Dublin: Dublin Institute of Technology.

Stowell, S., 212. Performing a binomial test in R. s.l.:s.n.

Su, A. Y. S. et al., 2015. Effects of Annotations and Homework on Learning Achievement: An Empirical Study of Scratch Programming Pedagogy. Journal of Educational Technology & Society, 2015 October, 18(4), pp. 331-343.

Tafliovich, A., Campbell, J. & Petersen, A., 2013. A Student Perspective on Prior Experience in CS1. Denver, Colorado, USA., ACM, pp. 239-244.

Uysal, M. P., 2014. Improving First Computer Programming Experiences: The Case of Adapting a Web-Supported and Well- Structured problem-Solving Method to a Traditional Course. Contemporary Educational Technology, 5(3), pp. 198-217.

Vapnick, V. N., 1995. Statistical Learning Theory. London: A Wiley-Interscience.

Watson, C., Li, F. W. & Godwin, J. L., 2014. No tests required: comparing traditional and dynamic predictors of programming success. s.l., ACM, pp. 469-474.

Veerasamy, A. K., Daryl D'Souza, R. L. & Laakso, M.-J., 2018. The impact of prior programming knowledge on lecture attendance and final exam. Journal of Educational Computing Research, 0(0), pp. 226-253.

Veerasamy, A. K. et al., 2016. The Impact of Lecture Attendance on Exams for Novice Programming Students. International Journal of Modern Education and Computer Science (IJMECS), 8(5), pp. 1-11.

Veerasamy, A. K., D'Souza, D., Lindén, R., & Laakso, M.â€J. (2018, November 6). Relationship between perceived problemâ€solving skills and academic performance of novice learners in introductory programming courses. Journal of Computer Assisted Learning.

Vihavainen, A., 2013. Predicting Students' Performance in an Introductory Programming Course Using Data from Students' Own Programming Process. Beijing, China, IEEE, pp. 498-499.

Witten, I. H. & Frank, E., 2005. Credibility: Evaluating what's been learned. In: J. Gray, ed. Data Mining - Practical Machine learning tools and techniques. s.l.:Morgan Kaufmann, pp. 149-151.

Vogel-Heuser, B., Rehberger, S., Frank, T. & Aicher, T., 2014. Quality despite quantity — Teaching large heterogenous classes in C programming and fundamentals in computer science. Istanbul, IEEE.

Wong, W.-c., 2014. The Impact of Programming Experience on Successfully Learning Systems Analysis and Design. Baltimore, Maryland USA, Education Special Interest Group of AITP, pp. 1-9.

Yuer, A. & Güngörmüş, A. H., 2011. Factors Associated with Student Performance in Financial Accounting Course. European Journal of Economic and Political Studies, 4(2), pp. 141-156.

Zingaro, D., 2015. Examining Interest and Grades in Computer Science 1: A Study of Pedagogy and Achievement Goals. ACM Transactions on Computing Education, September, 15(3), pp. 14:01 -14:18.

Öncu, S., Sengel, E. & Delialioglu, Ö., 2008. How does prior knowledge affect student engagement in undergraduate level computer literacy classes?. Eskişehir, Turkey, IETC, pp. 1063-1067.




How to Cite

Veerasamy, A. K., D’Souza, D., Lindén, R., & Laakso, M.-J. (2019). Prediction of Student Final Exam Performance in an Introductory Programming Course: Development and Validation of the Use of a Support Vector Machine-Regression Model. Asian Journal of Education and E-Learning, 7(1).