Table of Contents

Introduction

Member

NameMail
蔡昀達bb04902103@gmail.com
廖其忻cayon.1318.96@hotmail.com
尹聖翔

public dataset

ISCX-URL-2016(https://www.unb.ca/cic/datasets/url-2016.html)

kaggle

Phising URLS

SPAM URLS

Malware URLS (These three does not update for a long time)

Benign URLS

Other Source

###### Black list

  1. EtherAddressLooup - https://etherscamdb.info/api

Intelligence website

White List - https://www.alexa.com/topsites

- Chrome外掛軟體 Googel WOT plugin

#####

- http://whois.domaintools.com

- https://www.urlvoid.com

- https://www.ipvoid.com/

- https://www.apivoid.com/api/domain-reputation/

#####

Black List (目前沒有可利用的資料)

- (主要是查IP)https://www.abuseipdb.com/

- (主要是Domain Name,僅參考)https://www.riskiq.com/platform/architecture/internet-data-sets/passive-dns/

- (黑名單太少,參考用)https://otx.alienvault.com/

Meeting

09/16

  1. check basline CNN model has high accuracy
  2. build tfidf model
  3. build explain feature
    1. triggered pattern
    2. malicious url family matching
  4. phishing url survey

07/31

  1. Collect training data
  2. Collect whois information
  3. manual feature
  4. model desgin

Reference

YearVenueTitleLinkAssign
2017arxivMalicious URL Detection using Machine Learning: A SurveyPDF

https://hackmd.io/RKXNLcvUQY2a-cQAESigqw?view