====== Introduction ======
  * Member : 蔡昀達, 廖其忻
  * Meeting : 

====== Member ======

^Name^Mail^
|蔡昀達|bb04902103@gmail.com|
|廖其忻|cayon.1318.96@hotmail.com |
|尹聖翔|b06902103@ntu.edu.tw |


====== public dataset ======

ISCX-URL-2016(https://www.unb.ca/cic/datasets/url-2016.html)

kaggle 
  -https://www.kaggle.com/antonyj453/urldataset
  -https://www.kaggle.com/aktank/url-detection
  -https://www.kaggle.com/deepak730/finding-malicious-url-through-url-features

Phising URLS
  - Phishtank https://www.phishtank.com/developer_info.php
  - Open Phis https://openphish.com/

SPAM URLS
  - JWSPAMSPY http://www.joewein.de/sw/blacklist.htm

Malware URLS  (These three does not update for a long time)
  - DNS-BH - http://www.malwaredomains.com/wordpress/?page_id=66
  - https://www.malwarepatrol.net/my-account/
  - http://www.malwaredomainlist.com/

Benign URLS
  - Majestic - https://majestic.com/reports/majestic-million

Other Source
  - https://zeltser.com/malicious-ip-blocklists/

######
Black list
  - EtherAddressLooup - https://etherscamdb.info/api
  - Bambenek consulting - https://osint.bambenekconsulting.com/feeds/
  - firehol - http://iplists.firehol.org/
  - Spamhaus and DShield - http://www.squidblacklist.org/downloads/drop.malicious.rsc
  - squidguard - http://www.squidguard.org/blacklists.html
  - blackweb - https://github.com/maravento/blackweb


======  Intelligence website ======
White List
- https://www.alexa.com/topsites

- Chrome外掛軟體 Googel WOT plugin

#####

- http://whois.domaintools.com

- https://www.urlvoid.com

- https://www.ipvoid.com/

- https://www.apivoid.com/api/domain-reputation/

#####

Black List (目前沒有可利用的資料)

- (主要是查IP)https://www.abuseipdb.com/

- (主要是Domain Name，僅參考)https://www.riskiq.com/platform/architecture/internet-data-sets/passive-dns/

- (黑名單太少，參考用)https://otx.alienvault.com/

======  Meeting ======


===== 09/23 progress =====
  - tfidf results
    - accuracy : 98.3%
  - model explain mechanism
    - implemented : highlight  trigger pattern
    - results1 (tree interpreter):
    {{:dada.1.png}}
    - results2 (lime interpreter):
    {{:dada.2.png}}
    

^Year^Venue^Title^Link^Assign^
|2016|ACM|LIME:Why should i trust you?: Explaining the predictions of any classifier|[[https://www.kdd.org/kdd2016/papers/files/rfp0573-ribeiroA.pdf|PDF]]| |
|2014|KAIS|Shapley sampling values:Explaining prediction models and individual predictions with feature contributions|[[|PDF]]| |
|2017|arxiv|DeepLIFT:Learning important features through propagating activation differences|[[|PDF]]| |
|2016|IEEE SP|QII:Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems|[[|PDF]]| |
|2015|PLoS ONE|Layer-wise relevance propagation:On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation|[[|PDF]]| |
|2010|PLoS ONE|Shapley regression values:Analysis of regression in game theory approach|[[|PDF]]| |


===== 09/16 =====
  - check basline CNN model has high accuracy
  - build tfidf model
  - build explain feature
    - triggered pattern
    - malicious url family matching
  - phishing url survey


===== 07/31 =====
  - Collect training data
  - Collect whois information
  - manual feature
  - model desgin


====== Reference ======

^Year^Venue^Title^Link^Assign^
|2017|arxiv|Malicious URL Detection using Machine Learning: A Survey|[[https://arxiv.org/pdf/1701.07179.pdf|PDF]]| |


https://hackmd.io/RKXNLcvUQY2a-cQAESigqw?view