AI/ML and Data Science

    • What is the Difference between AI,ML and Deep Learning

    • Applied data science

    • Applied mathmatics

    • Computational Mathematical modeling

    • AI (data science, ML, Predictive Analytics, prescriptive analytics, Probability and Statistics)

    • Big Data

    • Cloud and MLOps

    • Quantum computing and Quantum machine learning

  • Types:

    • Supervised learning

      • Bayesian classifiers

      • Decision Trees

      • Labelled data

        • Classification

          • Decision Tree Classification

          • Random Forest Classification

          • K-nearest Neighbor

            • Example : Spam or ham

        • Regression

          • Logistic Regression

          • Polynomial Regression

          • Support Vector Machines

            • Example : Give port numbers which are under attack and ML algo will tell if it DDOS/bot network

    • Unsupervised learning (Group similar data)

      • Clustering

        • K-means Clustering

        • Hierarchical Clustering

        • Principal Component Analysis

          • To reduce the dimensions

    • Association (finds the important relationship)

      • Apriori

      • Eclat

    • Reinforcement learning

    • Deep Learning

      • Artificial Neural network

      • Neural Network

  • Convolutional neural networks (CNNs)

  • Recursive neural networks (RNNs)

  • Long short-term memory (LSTM)

  • Shallow neural networks

  • Autoencoders (AEs)

  • Restricted Boltzmann machines

  • Use cases:

    • Attack surface detection based on mathematical modeling

    • pre-ransomware activity detection

    • IOT

    • DDOS

    • DNS

    • Malware

    • Phishing emails

  • Key Concepts

    • feature engineering

    • feature selection

    • feature selection methods: filter methods, wrapper methods, and embedded method

    • reduce the complexity

    • overfitting of the model

  • One-hot encoding

  • L1 and L2 loss

  • accuracy

  • We need continuous monitoring/information from following to investigate anomalies, suspicious activity, TTP's, IOC's, IOA's

  • - Asset Discovery

  • - Vulnerability Management

  • - Endpoint Security

  • - IDS

  • - IPS

  • - Firewall

  • - WAF

  • - Load Balancer

  • - DNS logs

  • - Sysmon logs

  • - SIEM

  • - Business application logs

  • - OS event logs

  • - Sign in Activity

  • - Email flow (Phishing emails)

  • - Threat specific to industry

  • - Honeypot

  • - Honey accounts

  • - current incident response strategy

  • - clients, applications, web and database servers, NIDSs, HIDs, firewalls

  • Cyber Security Analytics Goals

  • - reduces the time for remediation

  • - automated threat discovery/detection

  • - discovering new suspicious patterns

  • define few use cases to implement Cyber Security Analytics

  • - Malicious activity detection with out using known IOC's

  • - Behavioral Analytics

  • - Network traffic patterns

  • - phishing Email detection (email header analysis, url links)

  • - Botnet analysis

  • - Learning historical SIEM/network data

  • Data

  • structured

  • semi structured

  • un structured

  • labeled and unlabeled data

  • what is our problem ?

  • what we are expecting to solve ?

  • what is training and Testing data ratio ?

  • From where we are getting this ?

  • Model overfitting issue

  • Model underfitting issue

  • negative biases

  • Types ML

  • - supervised

  • phishing email detection

  • - Deep Learning

  • - Un-surpervised

  • UEBA, Malware behavior, ransomewaere attacks

  • - Reinforcement

  • Problem types:

  • Regression

  • linear regression, logistic regression, polynomial regression, lasso regression

  • clustering

  • classification

  • - Web Traffic classification

  • dimension reduction

  • density estimation

  • Famous algos

  • SVM

  • Bayesian networks

  • decision trees

  • random forests

  • Hierarchical algos

  • genetic algorithms

  • ANN's

  • Areas to focus (from aws)

    • Threat, intrusion and anomaly detection for cloud security

    • ML for malware analysis and detection

    • Finding security vulnerabilities using ML

    • Protecting and preserving data privacy in the cloud

    • Learning with limited/noisy labels and weakly supervised learning

    • Causal inference for information security

    • Graph modeling and anomaly detection on graphs

    • Zero/One-shot learning for information security

  • The advent of the Internet of Things and the increasing dependence on digital technology have given rise to many security incidents in recent years. Data breaches, zero-day attacks, malware, ransomware, denial of service or DoS, phishing, and social engineering have progressed at near viral proportions. Cyber threats are outpacing the ability of cybersecurity to detect and prevent them. That’s where data science comes in.

  • Cybersecurity is dedicated to protecting networks, computers, software programs, and data using various technologies and processes. Data science provides techniques used in machine learning, such as data modeling, statistical analysis, predictions, anomaly detection, forecasts, and pattern finding. Cybersecurity data science is a scientific, data-focused approach to identifying threats through machine learning. Leveraging the systems and processes of data science increases the ability to extract security incident insight and patterns from cybersecurity data for faster detection and a more robust, effective defense.

  • Data Science Courses

  • Where cybersecurity takes action, data science provides the analysis and vision. Machine learning-powered by data science techniques helps with anomaly detection by quickly scanning huge amounts of code to look for differences that could indicate malicious code. Penetration testing is also improved using automation and adaptive learning to test firewalls against intrusion. Data science empowers cybersecurity with information, speed, and accuracy that far surpasses what it has been able to do to date.

  • Data science courses teach you the theory, tools, languages, and techniques widely used in the industry. Topics may include the following:

    • API Interactions

    • Computer Vision

    • Deep Learning (Neural Networks)

    • Ensemble Techniques & Model Tuning

    • Natural Language Processing (NLP)

    • PostgreSQL/pgAdmin

    • Python and R Programming for AI

    • Recommendation Systems

    • Statistics

    • Supervised and Unsupervised Learning

    • Recommendation Systems

    • Tableau, JavaScript, HTML5/CSS, and Git/GitHub

  • Curriculum

    • Introduction to supervised learning models

      • Logistic regression, Naive Bayes, neural networks, deep learning models

    • Introduction to unsupervised learning models

      • PCA, K-means, Gaussian mixture models

    • Live Demonstration: Building a machine learning pipeline

    • Introduction to Internet architecture, measuring Internet traffic behavior and anomaly detection

    • Live Demonstration: Analyze internet network traffic using unsupervised learning techniques

    • Applications of machine learning to network security

      • Supervised learning examples: Spam filtering, phishing

      • Unsupervised learning examples: Anomaly detection

    • Introduction to adversarial machine learning, threat models

      • Example: Distorting personalization

    • Defending against adversaries

    • Example: Evading intrusion/attack detection

    • Fairness, Transparency, and Explainability in cybersecurity ML models

      • Privacy definitions and how to actualize privacy for cybersecurity applications in industry

      • Externalities and implications of errors in ML models for cybersecurity

    • Responsible data lifecycles

      • Hands-on lab focused on building a model to detect fraudulent accounts leveraging virtual case study

    • Students have the option to develop a real or hypothetical cybersecurity machine learning deployment case study, culminating in personalized UChicago faculty feedback and guidance on your strategy.

  • Dataingestion methods

  • - apache flume and kafka

  • - amzon kinesis

  • Apache Sqoop, Apache Storm, Gobblin, Data Torrent, Syncsort, and Cloudera Morphline

  • Frameworks

  • threat hunting

  • Mitre

    • Installation of Anaconda

    • Introduction on Python

    • Introduction to Pandas

    • Introduction to Jupyter

    • Math fundamentals for ML

    • Introduction to linear algebra

    • Introduction to Statistics

    • Introduction to probability

    • Introduction to scikit learning

    • Exploratory Data Analysis and visulization using Pandas

    • Data Engineering

      • Postgres sql

      • Mongo db

      • Csv

      • Api reading

    • Machine Learning Terminology

      • Super vised learning

      • Unsuper vised learning

      • Hybrid

    • Classification

    • Clustering

    • KNN

    • Model evaluation

    • Linear Regression

    • Logitic Regression

    • NLTK

    • Naïve Bayes classification

    • Decision trees

    • Ensemble Techniques

    • Dimension Reduction

    • Stochastic Gradient Descent

    • Neural Networks

    • Deep Learning

    • Recommendation Engine

    • Classification of malicious urls

  • Phishing :

  • Supervised learning algorithms such as Random Forest, SVM, MLP, KNN, and their stacked ensembles

Python code to find file hash and search on virustotal and alien vault, export api key as environmental variable

Python code to check file, check file header and tell if it mz, pk, rar, elf.mac, %pdf. Mscf

PE header analysis

Find packed or unpacked

Base 64 encode detect and decode

Last updated