KDD 1999 generation faults : a review and analysis
Abstract
DARPA 1998 was one of the first Intrusion Detection datasets that was made publicly available. The KDD 1999 dataset was derived from DARPA 1998 to be used by researchers in developing machine learning (ML), classification and clustering algorithms with a security focus. DARPA 1998 has been criticised in literature due to raised concerns of problems in the dataset. Many researchers have accused KDD 1999 of having similar concerns but insufficient published evidence has been found. In this paper, we review the KDD 1999 generation process and present new proofs of existing inconsistencies in KDD 1999. We then present the process used to link some of the KDD 1999 (TELNET) records back to their origins in DARPA 1998 and discuss the interesting results and findings of this experiment.
Citation
Al Tobi , A M & Duncan , I 2018 , ' KDD 1999 generation faults : a review and analysis ' , Journal of Cyber Security Technology , vol. 2 , no. 3-4 , pp. 164-200 . https://doi.org/10.1080/23742917.2018.1518061
Publication
Journal of Cyber Security Technology
Status
Peer reviewed
ISSN
2374-2917Type
Journal article
Rights
© 2018, Informa UK Ltd. This work has been made available online in accordance with the publisher's policies. This is the author created accepted version manuscript following peer review and as such may differ slightly from the final published version. The final published version of this work is available at https://doi.org/10.1080/23742917.2018.1518061
Collections
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.