Simulation study on the performance of robust outlier labelling methods

Abdiweli, Ahmed Jama

Simulation study on the performance of robust outlier labelling methods

dc.contributor.author	Abdiweli, Ahmed Jama
dc.date.accessioned	2023-12-12T07:01:49Z
dc.date.available	2023-12-12T07:01:49Z
dc.date.issued	2023-10
dc.description	A research thesis submitted to the school of mathematics in partial fulfillment of the requirements for the award of the master of science in statistics of kampala international university	en_US
dc.description.abstract	The identification and labeling of outliers play a crucial role in data analysis and modeling tasks. Robust outlier labeling methods aim to accurately identify observations that deviate significantly from the majority of the data points while being resilient to noise, measurement errors, and data corruption. In this simulation study, we evaluate the performance of various robust outlier labeling methods using synthetic datasets. To conduct the study, we defined the simulation setup by specifying the characteristics of the datasets, including the number of variables, sample size, distributional assumptions, and proportion of outliers. Synthetic datasets were generated based on these specifications, incorporating both normal observations and outliers with known characteristics. A set of robust outlier labeling methods was selected for evaluation. These methods were designed to effectively handle outliers and provide reliable labels. Implementation of the selected methods was carried out using a programming language, ensuring proper application to the generated datasets. Performance metrics such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC) were defined to assess the effectiveness of the outlier labeling methods. Each method was applied to the synthetic datasets, and the results were recorded. The performance metrics were calculated based on the known labels of the synthetic outliers. The collected results were analyzed and compared to identify the strengths and limitations of each robust outlier labeling method. The performance metrics were used to assess accuracy, robustness, and computational efficiency. To ensure the reliability of the findings, the simulation study was repeated with different simulation setups and datasets, validating the consistency of the results across multiple iterations. Based on the findings, conclusions were drawn regarding the performance of the evaluated robust outlier labeling methods. The most effective methods for the specific characteristics of the datasets used in the study were identified. These findings provide valuable insights for researchers, practitioners, and data analysts in choosing appropriate outlier labeling methods for their data analysis and modeling tasks. In summary, this simulation study contributes to the understanding of the performance of robust outlier labeling methods and provides a systematic evaluation framework for comparing and selecting suitable methods in the presence of outliers.	en_US
dc.identifier.uri	http://hdl.handle.net/20.500.12306/14390
dc.language.iso	en	en_US
dc.publisher	Kampala International University	en_US
dc.title	Simulation study on the performance of robust outlier labelling methods	en_US
dc.type	Technical Report	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Abdiweli Ahmed Jama Thesis.pdf
Size:: 913.7 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters of Science in Information Systems