Normal view MARC view ISBD view

Comparative study of the performance of the classification algorithms of the Apache Spark ML library

By:

Camele, Genaro

Contributor(s):

Material type: Article

ArticlePublication details: : , 2021Description: 1 archivo (243,9 kB)Subject(s):

Online resources:

Summary: Classification algorithms are widely used in several areas: finance, education, security, medicine, and more. Another use of these algorithms is to support feature extraction techniques. These techniques use classification algorithms to determine the best subset of attributes that support an acceptable prediction. Currently, a large amount of data is being collected and, as a result, databases are becoming increasingly larger and distributed processing becomes a necessity. In this sense, Spark, and in particular its Spark ML library, is one of the most widely used frameworks for performing classification tasks in large databases. Given that some feature extraction techniques need to execute a classification algorithm a significant number of times, with a different subset of attributes in each run, the performance of these algorithms should be known beforehand so that the overall feature extraction process is carried out in the shortest possible time. In this work, we carry out a comparative study of four Spark ML classification algorithms, measuring predictive power and execution times as a function of the number of attributes in the training dataset.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Call number	Status	Date due	Barcode
Capítulo de libro	Biblioteca Fac.Informática	A1259 (Browse shelf(Opens below))	Available		DIF-A1259

Browsing Biblioteca Fac.Informática shelves Close shelf browser (Hides shelf browser)

Previous								Next
Previous	A1256 Improving a low cost surveillance system	A1257 Por un diseño inclusivo : caso de uso en un MOOC de accesibilidad web	A1258 Análisis de la experiencia de utilización del juego serio Desafiate para la autoevaluación de los alumnos	A1259 Comparative study of the performance of the classification algorithms of the Apache Spark ML library	A1260 Introducing agile methods in undergraduate curricula, a systematic mapping study	A1261 Juego serio de realidad virtual para acercar figuras importantes de la historia de la Informática	A1262 Redes de sensores, vehículos móviles y simulación en sistemas de tiempo real	Next

Formato de archivo PDF. -- Este documento es producción intelectual de la Facultad de Informática - UNLP (Colección BIPA/Biblioteca)

Classification algorithms are widely used in several areas: finance, education, security, medicine, and more. Another use of these algorithms is to support feature extraction techniques. These techniques use classification algorithms to determine the best subset of attributes that support an acceptable prediction. Currently, a large amount of data is being collected and, as a result, databases are becoming increasingly larger and distributed processing becomes a necessity. In this sense, Spark, and in particular its Spark ML library, is one of the most widely used frameworks for performing classification tasks in large databases. Given that some feature extraction techniques need to execute a classification algorithm a significant number of times, with a different subset of attributes in each run, the performance of these algorithms should be known beforehand so that the overall feature extraction process is carried out in the shortest possible time. In this work, we carry out a comparative study of four Spark ML classification algorithms, measuring predictive power and execution times as a function of the number of attributes in the training dataset.

Congreso Argentino de Ciencias de la Computación (27mo : 2021 : Salta, Argentina)

There are no comments on this title.

to post a comment.