Cyber Security and Machine Learning (First Edition)
This is the page for my new book "Cyber Security and Machine Learning" (First Edition), In Python and PyTorch.
This book covers topics at the cross-section of Cyber Security and Machine Learning (ML). Specifically, the book covers the use of machine learning for Cyber Security,
as well as cyber security issues associated with AI models and systems, such as AI vulnerabilities.
Topics include: the techniques of machine learning and deep learning as applied to cyber security-related datasets and problems, adversarial machine learning,
LLM and other pre-trained model vulnerabilities and defenses, and more.
Contents
- Dedication v
- Acknowledgements vii
- Preface xv
- 1 Introduction 1
- 1.1 Setting up your Environment 2
- 1.1.1 Anaconda 2
- 1.1.2 Setting up your Environment with a Physical Box 2
- 1.2 Background 3
- 1.3 Companion GitHub Code and YouTube videos 3
- 1.4 Numpy Arrays, Tensors, and Linear Algebra 4
- 1.4.1 Numpy Arrays 5
- 1.4.2 Tensor Operations with PyTorch 16
- 1.5 Conclusion 20
- 2 Introduction to Cyber Security and Machine Learning 23
- 2.1 Risk 23
- 2.1.1 Expected Risk given Threat and Impact Probabilities 23
- 2.1.2 Entropy 25
- 2.2 Computer Hacking 26
- 2.3 Network Security 27
- 2.4 Software Security and Malware 29
- 2.5 Social Engineering 34
- 2.6 Cryptography 35
- 2.7 AI Assurance 36
- 2.7.1 AI Auditing and Explainability 36
- 2.7.2 Bias Testing 37
- 2.7.3 Adversarial Attacks 37
- 2.8 Introduction to Cyber Security and Machine Learning Literature 37
- 2.9 Summary 38
- 3 Traditional Machine Learning 39
- 3.1 Code Issues 40
- 3.2 Object Oriented Programming 43
- 3.3 Performance Evaluation 44
- 3.3.1 Regression Performance Evaluation 44
- 3.3.2 Classification Performance Evaluation 45
- 3.3.3 Plotting Performance 47
- 3.4 Information Theory 49
- 3.4.1 Entropy 49
- 3.4.2 Kullback-Leibler (KL) divergence 52
- 3.4.3 Mutual Information 53
- 3.4.4 Information Gain 55
- 3.4.5 Conditional Entropy 56
- 3.5 Optimization 60
- 3.6 Anomaly Detection Algorithms 63
- 3.6.1 Hopfield Networks 63
- 3.6.2 Boltzman Machines 67
- 3.7 Popular Supervised Learning Algorithms 67
- 3.7.1 KNN 67
- 3.7.2 Logistic Regression 73
- 3.7.3 Neural Networks 74
- 3.7.4 Regression, Trees, and XGBoost 75
- 3.8 Summary 76
- 4 Data Loading and Pre-processing 77
- 4.1 Loading the Data 77
- 4.2 Data and Feature Pre-Processing 80
- 4.2.1 CSV Files 80
- 4.3 One Hot Encoding 81
- 4.4 Features 83
- 4.4.1 Sniffing and Spoofing 85
- 4.4.2 Features from Network Data 88
- 4.4.3 Features from Malware 96
- 4.4.4 Features from Forensics 102
- 4.4.5 Features from Text 103
- 4.4.6 Features from Websites for Web Security 105
- 4.4.7 Features from Images 106
- 4.5 Corpora (i.e. Annotated Datasets) 112
- 4.6 Summary 114
- 5 Deep Learning: Starting at the beginning 115
- 5.1 Things to know about the code 115
- 5.2 Getting Started with Deep Leaning 118
- 5.3 Deep Leaning Definition 121
- 5.4 PyTorch Basics 121
- 5.5 Loading your Data into your PyTorch code 123
- 5.6 Linear Regression 127
- 5.6.1 Theory and Intuition of Linear Regression 127
- 5.6.2 Linear Regression model for Wine Quality data 131
- 5.7 Linear Regression to Logistic Regression and Neural Networks 147
- 5.7.1 Regression vs Classification 147
- 5.7.2 Instantiating the different models 148
- 5.7.3 Reading Data in batches 149
- 5.7.4 What is Number of Epochs? 150
- 5.8 Logistic Regression 151
- 5.8.1 Theory and Intuition of Logistic Regression 151
- 5.8.2 Entropy and Cross Entropy 158
- 5.8.3 Logistic Regression NN Architecture 171
- 5.9 Layers of the Neural Network in PyTorch 173
- 5.9.1 Intuition of adding more layers to a Neural Network 175
- 5.10 Going Deep: An N layer Neural Network in PyTorch 175
- 5.11 Easy Deep Learning with the Iris dataset 178
- 5.12 More Challenging Deep Learning with the Wine Quality dataset 192
- 5.13 Transfer Learning 204
- 5.13.1 Important Ideas in Transfer Learning 204
- 5.13.2 Pre-training 204
- 5.13.3 Fine-tuning 205
- 5.13.4 Freezing some weights of the neural network 205
- 5.13.5 Sharing the weights from one model to another 207
- 5.13.6 Reinforcement Learning Through Human Feedbacks (RLHF) 207
- 5.13.7 Transfer Learning Performance Metrics 207
- 5.14 Summary 208
- 6 Network Security 209
- 6.1 Intrusion Detection Systems (IDS) 209
- 6.2 Anomaly Detection with Packet Payloads 216
- 6.3 Restricted Boltzman Machines 223
- 6.4 Privacy Preserving Auto-Encoder 226
- 6.5 Summary 230
- 7 Software Security and Malware 231
- 7.1 Malware Detection 231
- 7.2 Transfer Learning for Malware Detection 237
- 7.3 Summary 249
- 8 Social Engineering 251
- 8.1 Phishing Classifier Using Meta-Data 251
- 8.2 Phishing Detection using Transfer Learning and Text Features 259
- 8.3 Phishing Detection specific Tools such as NVIDIA’s Morpheus 261
- 8.4 Summary 261
- 9 Cryptography 263
- 9.1 Cracking Hashes 263
- 9.2 Homomorphic Cryptography and Torch 269
- 9.3 Summary 272
- 10 AI Assurance 273
- 10.1 AI Auditing and Explainability 273
- 10.1.1 Shapley 274
- 10.2 Bias Testing 279
- 10.2.1 The 4/5s metric 279
- 10.2.2 Differential Validity Tests 280
- 10.3 Adversarial Machine Learning 280
- 10.3.1 A Concrete Adversarial ML Example with Images 284
- 10.3.2 Attacks on LLMs with Pre-existing Tokens 302
- 10.3.3 Adversarial Perturbation of Text Embeddings 303
- 10.4 Software Testing 306
- 10.5 Data or Model Drift 307
- 10.6 AI Toxicity 309
- 10.7 Federated Learning and Privacy 311
- 10.8 Summary 316
- 11 Conclusions and Final Thoughts 317
- 11.1 The Cyber Security and AI horizon 317
- 11.2 Summary 318
- Other Books by Ricardo A. Calix 319
- The Author: Ricardo A. Calix 321
- Glossary 323
- Bibliography 325
- An Important Final Note 327
GitHub
Community and Blog
Available Chapters On-line
- Chapter 1 - Introduction
- Chapter 2 - Introduction to Cyber Security
- Chapter 3 - Traditional ML
- Chapter 4 - Data and Features
- Chapter 5 - Introduction to Deep Learning
- Chapter 6 - ML for network security
- Chapter 7 - ML for software assurance
- Chapter 8 - ML for social engineering
- Chapter 9 - ML and cryptography
- Chapter 10 - AI assurance
- Chapter 11 - Conclusion
Copyright, License, FTC and Amazon Disclaimer
All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, without written permission of the copyright owner.
This post/page/article includes Amazon Affiliate links to products. This site receives income if you purchase through these links. This income helps support content such as this one. Code: MIT License.
