S. No |
Terms |
Explanation |
1 |
A/B Testing |
Experimentation method comparing two versions to determine which performs better. |
2 |
Anomaly Detection |
Identifying patterns in data that do not conform to expected behavior. |
3 |
Artificial Intelligence |
Machines performing tasks that typically require human intelligence. |
4 |
AUC-ROC |
Area under the ROC curve, indicating model’s ability to distinguish between classes. |
5 |
Autoregressive Integrated Moving Average (ARIMA) |
Time series forecasting model considering autocorrelation and moving averages. |
6 |
Bagging |
Bootstrap aggregating, ensemble technique combining multiple models. |
7 |
Batch Gradient Descent |
Gradient descent using the entire training dataset for each iteration. |
8 |
Batch Normalization |
Normalizing layer inputs to improve training stability and speed. |
9 |
Batch Size |
Number of training examples used in one iteration of gradient descent. |
10 |
Bayesian Statistics |
Statistical approach based on Bayes’ theorem, incorporating prior knowledge. |
11 |
Bias |
Systematic error in model predictions, not accounting for all factors. |
12 |
Bias-Variance Tradeoff |
Balancing model complexity (variance) and generalization to new data (bias). |
13 |
Big Data |
Large, complex datasets challenging for traditional data processing. |
14 |
Bootstrap Sampling |
Resampling technique drawing random samples with replacement. |
15 |
Categorical Encoding |
Representing categorical variables as numerical values. |
16 |
Classification |
Assigning categories to data based on its features. |
17 |
Clustering |
Grouping similar data points together. |
18 |
Confusion Matrix |
Table showing true/false positives/negatives, used to evaluate model performance. |
19 |
Convolutional Neural Networks (CNN) |
Neural networks designed for processing structured grid data, such as images. |
20 |
Cost Function |
Aggregate measure of the loss function across all training samples. |
21 |
Cross-Entropy |
Measure of the average number of bits needed to represent or transmit an average event. |
22 |
Cross-Validation |
Technique to assess model performance by splitting data into training and testing sets. |
23 |
Data Cleaning |
Process of identifying and correcting errors or inconsistencies in data. |
24 |
Data Mining |
Extracting patterns and knowledge from large datasets. |
25 |
Data Normalization |
Scaling numerical data to a standard range to improve model performance. |
26 |
Data Science |
Interdisciplinary field using scientific methods to extract insights from data. |
27 |
Data Wrangling |
Preprocessing step to transform raw data into a suitable format for analysis. |
28 |
Decision Trees |
Tree-like model of decisions, useful in classification and regression. |
29 |
Deep Learning |
Subset of ML, using neural networks with multiple layers. |
30 |
Dimensionality Reduction |
Reducing the number of features while preserving essential information. |
31 |
Dropout |
Technique in neural networks where randomly selected neurons are ignored during training to prevent overfitting. |
32 |
Early Stopping |
Technique to stop training when a monitored metric stops improving. |
33 |
Ensemble Learning |
Combining multiple models to improve overall performance. |
34 |
Ensemble Methods |
Combining multiple models to achieve better predictive performance. |
35 |
Exploratory Data Analysis (EDA) |
Initial analysis of data to understand its structure, patterns, and relationships. |
36 |
F1 Score |
Harmonic mean of precision and recall, balancing both metrics. |
37 |
Feature Engineering |
Transforming raw data into features suitable for modeling. |
38 |
Feature Importance |
Assessing the impact of each feature on the model’s predictions. |
39 |
Feature Scaling |
Standardizing or normalizing features to a similar scale. |
40 |
Feature Selection |
Choosing relevant features for model training, discarding irrelevant ones. |
41 |
Gradient Boosting |
Ensemble technique combining weak learners to create a strong learner. |
42 |
Gradient Descent |
Optimization algorithm to minimize the loss function and reach the model’s minimum. |
43 |
Grid Search |
Exhaustive search over a specified hyperparameter space to find the optimal values. |
44 |
Hierarchical Clustering |
Unsupervised clustering algorithm creating a tree of clusters. |
45 |
Homoscedasticity |
Assumption in regression analysis where the variance of the errors is constant across all levels of the independent variable. |
46 |
Hyperparameter |
External configuration of a model, set before training. |
47 |
Hyperparameter Tuning |
Adjusting parameters outside the model to optimize its performance. |
48 |
Hypothesis Testing |
Statistical method to validate or reject assumptions about a population. |
49 |
Imputation |
Filling in missing data with estimated or predicted values. |
50 |
K-Fold Cross-Validation |
Cross-validation method dividing data into k subsets for training and testing. |
51 |
K-Means Clustering |
Unsupervised clustering algorithm aiming to partition data into k clusters. |
52 |
K-Nearest Neighbors (KNN) |
Classification algorithm based on the majority class of its k-nearest neighbors. |
53 |
Lift Chart |
Graphical representation showing the performance of a predictive model compared to a baseline model. |
54 |
Log Transformation |
Applying the natural logarithm to data, useful for handling skewed distributions. |
55 |
Logistic Regression |
Regression analysis for predicting the probability of a binary outcome. |
56 |
Long Short-Term Memory (LSTM) |
Type of recurrent neural network (RNN) suitable for sequential data. |
57 |
Loss Function |
Objective function quantifying the difference between predicted and actual values. |
58 |
Machine Learning |
Subset of AI, algorithms enable systems to learn patterns from data. |
59 |
Mean Squared Error (MSE) |
Average of the squared differences between predicted and actual values. |
60 |
Model Evaluation Metrics |
Quantitative measures assessing the performance of a model. |
61 |
Multicollinearity |
High correlation between two or more independent variables. |
62 |
Multivariate Analysis |
Analyzing patterns and relationships among multiple variables simultaneously. |
63 |
Mutual Information |
Measure of the amount of information shared between two variables. |
64 |
Naive Bayes |
Probabilistic algorithm based on Bayes’ theorem, often used for classification. |
65 |
Natural Language Processing (NLP) |
Enabling machines to understand, interpret, and generate human language. |
66 |
Neural Networks |
Networks inspired by the human brain, used in machine learning. |
67 |
One-Hot Encoding |
Technique to convert categorical variables into binary vectors. |
68 |
Outlier Detection |
Identifying data points significantly different from the majority. |
69 |
Overfitting |
Model fitting training data too closely, performing poorly on new, unseen data. |
70 |
Pearson Correlation Coefficient |
Measure of linear correlation between two variables. |
71 |
Precision |
Proportion of true positives among total predicted positives. |
72 |
Precision-Recall Curve |
Graph illustrating the trade-off between precision and recall. |
73 |
Predictive Analytics |
Using data, statistical algorithms, and machine learning to predict future outcomes. |
74 |
Principal Component Analysis (PCA) |
Technique for reducing dimensionality, identifying most significant components. |
75 |
p-value |
Probability of observing a test statistic as extreme as the one obtained, assuming the null hypothesis is true. |
76 |
Random Forest |
Ensemble method using multiple decision trees. |
77 |
Recall |
Proportion of true positives among actual positives. |
78 |
Recurrent Neural Networks (RNN) |
Neural networks designed for sequential data processing. |
79 |
Regression Analysis |
Analyzing relationship between dependent and independent variables. |
80 |
Regularization |
Technique to prevent overfitting by adding a penalty term to the loss function. |
81 |
Reinforcement Learning |
Learning by interacting with an environment and receiving feedback. |
82 |
Resampling |
Technique involving the creation of new samples from the original dataset. |
83 |
Residuals |
Differences between predicted and actual values in regression analysis. |
84 |
ROC Curve |
Receiver Operating Characteristic curve, illustrating true positive rate vs. false positive rate. |
85 |
ROC-AUC Score |
Area under the ROC curve, a metric for binary classification models. |
86 |
R-squared |
Coefficient of determination, indicating the proportion of variance in the dependent variable explained by the independent variable(s). |
87 |
Silhouette Score |
Measure of how well-separated clusters are in clustering algorithms. |
88 |
Stochastic Gradient Descent (SGD) |
Variant of gradient descent using a random subset of data for each iteration. |
89 |
Stratified Cross-Validation |
Cross-validation ensuring each subset has a proportionate representation of classes. |
90 |
Streaming Data |
Continuous and real-time data that can be processed as it arrives. |
91 |
Supervised Learning |
Training a model using labeled data. |
92 |
Support Vector Machines (SVM) |
Algorithm for classification and regression analysis. |
93 |
Support Vector Regression (SVR) |
Regression algorithm using support vector machines. |
94 |
Time Complexity |
Measure of the computational time an algorithm takes with respect to its input size. |
95 |
Time Series Analysis |
Analyzing time-ordered data to identify patterns and trends. |
96 |
Transfer Learning |
Using knowledge gained from one task to improve performance on a related task. |
97 |
Underfitting |
Model too simple, unable to capture underlying patterns in data. |
98 |
Unsupervised Learning |
Training a model without labeled data, finding patterns on its own. |
99 |
Variance |
Model’s sensitivity to changes in the training data, capturing noise. |
100 |
XGBoost |
Implementation of gradient boosting, known for its speed and performance. |
Pingback: 100 Data Science Terms Every Data Scientist Sho...