Simulation study on the performance of robust outlier labelling methods

dc.contributor.authorAbdiweli, Ahmed Jama
dc.date.accessioned2023-12-12T07:01:49Z
dc.date.available2023-12-12T07:01:49Z
dc.date.issued2023-10
dc.descriptionA research thesis submitted to the school of mathematics in partial fulfillment of the requirements for the award of the master of science in statistics of kampala international universityen_US
dc.description.abstractThe identification and labeling of outliers play a crucial role in data analysis and modeling tasks. Robust outlier labeling methods aim to accurately identify observations that deviate significantly from the majority of the data points while being resilient to noise, measurement errors, and data corruption. In this simulation study, we evaluate the performance of various robust outlier labeling methods using synthetic datasets. To conduct the study, we defined the simulation setup by specifying the characteristics of the datasets, including the number of variables, sample size, distributional assumptions, and proportion of outliers. Synthetic datasets were generated based on these specifications, incorporating both normal observations and outliers with known characteristics. A set of robust outlier labeling methods was selected for evaluation. These methods were designed to effectively handle outliers and provide reliable labels. Implementation of the selected methods was carried out using a programming language, ensuring proper application to the generated datasets. Performance metrics such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC) were defined to assess the effectiveness of the outlier labeling methods. Each method was applied to the synthetic datasets, and the results were recorded. The performance metrics were calculated based on the known labels of the synthetic outliers. The collected results were analyzed and compared to identify the strengths and limitations of each robust outlier labeling method. The performance metrics were used to assess accuracy, robustness, and computational efficiency. To ensure the reliability of the findings, the simulation study was repeated with different simulation setups and datasets, validating the consistency of the results across multiple iterations. Based on the findings, conclusions were drawn regarding the performance of the evaluated robust outlier labeling methods. The most effective methods for the specific characteristics of the datasets used in the study were identified. These findings provide valuable insights for researchers, practitioners, and data analysts in choosing appropriate outlier labeling methods for their data analysis and modeling tasks. In summary, this simulation study contributes to the understanding of the performance of robust outlier labeling methods and provides a systematic evaluation framework for comparing and selecting suitable methods in the presence of outliers.en_US
dc.identifier.urihttp://hdl.handle.net/20.500.12306/14390
dc.language.isoenen_US
dc.publisherKampala International Universityen_US
dc.titleSimulation study on the performance of robust outlier labelling methodsen_US
dc.typeTechnical Reporten_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Abdiweli Ahmed Jama Thesis.pdf
Size:
913.7 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: