AI Application for Detection of Android Malware APKs and Fake e-Commerce Websites
2019-11-30, 12:00–12:30, Dachsaal

  • MAL2 project employs AI for malware and fake websites detection and comprises two parts:
    1. Neural Network-Based Technique for Android Smartphone Applications Classification
    2. Automating Fake e-Commerce Website Detection through Machine Learning
    In our talk we will speak about AI applications for different domains of Cyber Security and demonstrate advantages of AI approach compared to previous solutions.

  • Android Smartphone Applications Classification
    ** With the booming development of smartphone capabilities, these devices are increasingly frequent victims of targeted attacks in the cyberspace. Protecting Android smartphones against the increasing number of malware applications has become as crucial as it is complex. To be effective in identifying and defeating malware applications, cyber analysts require novel distributed detection and reaction methodologies based on artificial intelligence techniques that can automatically analyse new applications and share analysis results between smartphone users. Our goal is to provide a real-time solution that can extract application features and find related correlations within an aggregated knowledge base in a fast and scalable way, and to automate the classification of Android smartphone applications. Our effective and fast application analysis method is based on AI and can support smartphone users in malware detection and allow them to quickly adopt suitable countermeasures following malware detection. We evaluate a deep neural network supported by word-embedding technology as a system for malware application classification and assess its accuracy and performance. This approach should reduce the number of infected smartphones and increase smartphone security. We demonstrate how the presented techniques can be applied to support smartphone application classification tasks performed by smartphone users. We perform manual analysis of the manifest and source files of android applications in order to formulate additional features if possible. The model trained on the newest malware samples employing different parameter we compare with our previous model.
  • Automating Fake e-Commerce Website Detection
    ** Shopping on the web is ubiquitous today, with about 70% of Europeans using this form of commerce in 2018. As more and more consumers make their purchases through the Internet, the risk of being involved in e-commerce fraud is increasing. Indeed, fraudulent e-commerce domains designed with the purpose of exploiting customers is a rapidly growing area in cybercrime. The exploitation can come in many forms, including money or credit card credentials stealing, private and sensitive data gathering, and much more.
    A major problem in the detection of fake e-commerce websites is that exposing such fake offerings is often a labor intensive and manual task. Current fake online-shop detection strategies are based on manual annotation and verification: there are blacklists maintained where hundreds of new fake online-shop domains are entered for manual verification every day. By the time they are flagged as fraudulent, all of the unsuspecting customers of the site will have already been scammed. Additionally, fraudulent online shops often exist only for a few hours or days, making of this manual verification process a major bottleneck in terms of detection latency to properly protect the end-consumer.
    Automating the fraudulent shop detection process is therefore essential to fighting this type of cybercrime. To that end, we conceive machine learning based approaches which can rapidly and automatically identify fake e-commerce websites. We use manually compiled databases containing known certified and fraudulent shops to train adaptive machine learning models that can identify new, non-verified fraudulent shops automatically. The models we have built use the structural code similarity between the verified and unverified shops as the basis for the detection. This approach has high fake-shop detection accuracy and has led to the detection of significant features such as copied snippets of code common to fraudulent sites. Preliminary evaluation results on almost 1000 websites for which the source code was scraped and tokenized show that it is possible to correctly categorize more than 90% of the e-commerce websites, missing less than 2% of fraudulent sites in the process.
    Additionally, as the newly identified fraudulent shops are manually verified by the owners of the blacklists, we monitor the detection accuracy of the proposed models over time, triggering new learning steps when detection performance drifts. The model is thus adaptive to the development and evolution of fraudulent site code writing as new techniques emerge.