The Role of Entropy-Based Features in Classifying Tor Traffic Using Machine Learning
Friday, 10 Oct 2025·,,,·
0 min read
Pitpimon Choorod
Sasin Janpuangtong
George R. S. Weir
Andreas Aßmuth
Abstract
The increasing demand for privacy has led to the emergence of anonymity networks, such as Tor. These networks allow users to completely hide their online activities, making them appealing to privacy-conscious individuals as well as enabling access to the darknet and dark web. While the darknet and dark web host legitimate uses, they are also a potential haven for cybercriminal activities. Consequently, this has created a significant need for advanced network traffic analysis, particularly in distinguishing between Tor and nonTor encrypted traffic. The design of Tor makes its traffic resemble normal TLS traffic, which poses a challenge for Tor detection. However, Tor’s unique encryption mechanisms can lead to distinct data distribution characteristics, including variations in the entropy of payload data. This study focuses on these entropy-based characteristics, specifically Entropy{_}1{_}Hex and Entropy{_}2{_}Hex. Statistical analyses were conducted to identify discernible patterns between Tor and nonTor payloads. Entropy{_}1{_}Hex and Entropy{_}2{_}Hex were employed as features for classifying Tor and nonTor traffic. To evaluate the effectiveness of entropy-based analysis, machine learning models, including Decision Tree and Random Forest, were applied. The classification results showed that tree-based models, particularly Random Forest, achieved high accuracy levels of over 90{%} with Entropy{_}1{_}Hex and over 97{%} with Entropy{_}2{_}Hex. These findings indicate that the entropy-based features of encrypted payloads exhibit distinct patterns, making them effective for classifying Tor and nonTor encrypted traffic and contributing to enhanced detection of darknet and dark web usage.
Type
Publication
Advances in Real-Time and Autonomous Systems