====== Introduction ====== * Member : 蔡昀達, 廖其忻 * Meeting : ====== Member ====== ^Name^Mail^ |蔡昀達|bb04902103@gmail.com| |廖其忻|cayon.1318.96@hotmail.com | |尹聖翔|b06902103@ntu.edu.tw | ====== public dataset ====== ISCX-URL-2016(https://www.unb.ca/cic/datasets/url-2016.html) kaggle -https://www.kaggle.com/antonyj453/urldataset -https://www.kaggle.com/aktank/url-detection -https://www.kaggle.com/deepak730/finding-malicious-url-through-url-features Phising URLS - Phishtank https://www.phishtank.com/developer_info.php - Open Phis https://openphish.com/ SPAM URLS - JWSPAMSPY http://www.joewein.de/sw/blacklist.htm Malware URLS (These three does not update for a long time) - DNS-BH - http://www.malwaredomains.com/wordpress/?page_id=66 - https://www.malwarepatrol.net/my-account/ - http://www.malwaredomainlist.com/ Benign URLS - Majestic - https://majestic.com/reports/majestic-million Other Source - https://zeltser.com/malicious-ip-blocklists/ ###### Black list - EtherAddressLooup - https://etherscamdb.info/api - Bambenek consulting - https://osint.bambenekconsulting.com/feeds/ - firehol - http://iplists.firehol.org/ - Spamhaus and DShield - http://www.squidblacklist.org/downloads/drop.malicious.rsc - squidguard - http://www.squidguard.org/blacklists.html - blackweb - https://github.com/maravento/blackweb ====== Intelligence website ====== White List - https://www.alexa.com/topsites - Chrome外掛軟體 Googel WOT plugin ##### - http://whois.domaintools.com - https://www.urlvoid.com - https://www.ipvoid.com/ - https://www.apivoid.com/api/domain-reputation/ ##### Black List (目前沒有可利用的資料) - (主要是查IP)https://www.abuseipdb.com/ - (主要是Domain Name,僅參考)https://www.riskiq.com/platform/architecture/internet-data-sets/passive-dns/ - (黑名單太少,參考用)https://otx.alienvault.com/ ====== Meeting ====== ===== 09/23 progress ===== - tfidf results - accuracy : 98.3% - model explain mechanism - implemented : highlight trigger pattern - results1 (tree interpreter): {{:dada.1.png}} - results2 (lime interpreter): {{:dada.2.png}} ^Year^Venue^Title^Link^Assign^ |2016|ACM|LIME:Why should i trust you?: Explaining the predictions of any classifier|[[https://www.kdd.org/kdd2016/papers/files/rfp0573-ribeiroA.pdf|PDF]]| | |2014|KAIS|Shapley sampling values:Explaining prediction models and individual predictions with feature contributions|[[|PDF]]| | |2017|arxiv|DeepLIFT:Learning important features through propagating activation differences|[[|PDF]]| | |2016|IEEE SP|QII:Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems|[[|PDF]]| | |2015|PLoS ONE|Layer-wise relevance propagation:On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation|[[|PDF]]| | |2010|PLoS ONE|Shapley regression values:Analysis of regression in game theory approach|[[|PDF]]| | ===== 09/16 ===== - check basline CNN model has high accuracy - build tfidf model - build explain feature - triggered pattern - malicious url family matching - phishing url survey ===== 07/31 ===== - Collect training data - Collect whois information - manual feature - model desgin ====== Reference ====== ^Year^Venue^Title^Link^Assign^ |2017|arxiv|Malicious URL Detection using Machine Learning: A Survey|[[https://arxiv.org/pdf/1701.07179.pdf|PDF]]| | https://hackmd.io/RKXNLcvUQY2a-cQAESigqw?view