Data mining is an interdisciplinary subfield of computer science It is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data preprocessing, model and inference considerations, interestingness metrics, complexity considerations, postprocessing of discovered structures, visualization, and online updating. Data mining is the analysis step of the “knowledge discovery in databases” process, or
KDD.
WEKA TRAINING COURSE MODULE
Getting started with Weka
- Introduction to weka
- How to install weka
- How to prepare dataset in arff and csv format
- Exploring and visualization of datasets
- Data Pre-processing and filters
- Numeric Data Transformation
- Discretization and Outliers detection
- Training and Testing Dataset
Classifiers
- Exploring basics of classification
- Exploring various classification algorithms
- Naïve Bayes and KNN classification
- Understanding Classification results
- Evaluation of Classification using Cross validation model
- Evaluation of classification using Percentage split
- Build Decision Tree J48 classifier
- Systematic Oversampling (Class Imbalance Problem)
- Saving the results of weka classifier
Evaluation Methods
- K cross Validation model
- Percentage split
- Using as a Training set
Understanding of Classification parameters
- Understanding Confusion matrix and accuracy
- Precision and Recall.
- Weighted Averages of Scores (Model Evaluation)
- AUC and ROC Curves
More Classifiers
- Stackling multiple classifiers
- How to add Support vector machine into weka Class path
- Classification using Regression
- Supervised Learning
- Unsupervised Learning
- Semi-Supervised Learning
Clustering
- Introduction to Clustering
- Understanding of clustering using KMeans
- Interpreting clustering results
- Classification on clustered data
Rule Mining
- Introduction to Frequent Pattern Mining
- Understanding the concept of Rule mining
- Apriori Algorithm (Frequent Items Mining)
- Application of Apriori Algorithm
- Understanding the results of Frequent pattern mining Attribute/ Feature Selection
- Introduction to Feature Selection
- Exploring various Feature selection techniques
- Correlation based Feature selection
- Exploring Various searching methods for attribute selection.
- Selecting features using Wrapper Method More About Filters
- Add Cluster to each instance
- Feature Selection using Filter (Dimensionality Reduction)
- Adding Classification
Various Application of Data Mining
- Market Basket Analysis
- Prediction of Data analysis
- Basics of Text Mining Text Mining
- How to prepare text data in arff format
- How to Pre-process text data
- Extracting bag of words vector from the Text data
- Exploring various properties of String to word Vector
- Generating Classifier of bag of words vector
How to predict the results of text data