Enhance Your Learning with RDBMS Flash Cards for quick learning
Stands for Relational Database Management System, a software system used to manage relational databases.
A type of database that organizes data into tables with rows and columns, and establishes relationships between tables.
Stands for Structured Query Language, a programming language used to manage and manipulate relational databases.
A unique identifier for a record in a table, used to ensure data integrity and enable efficient data retrieval.
A field in a table that refers to the primary key of another table, establishing a relationship between the two tables.
The process of organizing data in a database to eliminate redundancy and improve data integrity.
A data structure that improves the speed of data retrieval operations on a database table.
A sequence of database operations that are treated as a single unit, ensuring data consistency and integrity.
Stands for Atomicity, Consistency, Isolation, and Durability, a set of properties that guarantee reliable processing of database transactions.
Measures and techniques used to protect a database from unauthorized access, data breaches, and other security threats.
The process of creating copies of database files to protect against data loss in case of hardware failure, human error, or other disasters.
The process of restoring a database to a consistent state after a failure or data loss.
A large, centralized repository of integrated data from various sources, used for reporting, analysis, and decision-making.
The process of discovering patterns, relationships, and insights from large datasets using statistical and machine learning techniques.
Refers to extremely large and complex datasets that cannot be easily managed, processed, or analyzed using traditional database systems.
Stands for 'not only SQL', a type of database management system that provides flexible data models and scalability for handling big data.
A distinct object, concept, or event that is represented in a database and can be uniquely identified.
A characteristic or property of an entity, represented as a column in a database table.
A single row or record in a database table, containing data values for each attribute.
A request for data or information from a database, typically written in SQL.
A database operation that combines rows from two or more tables based on a related column between them.
A virtual table derived from the data in one or more tables, presenting a customized or filtered view of the data.
A logical structure that defines the organization and layout of a database, including tables, relationships, and constraints.
The accuracy, consistency, and reliability of data stored in a database, ensured through constraints and validation rules.
Techniques and mechanisms used to manage simultaneous access to a database by multiple users or applications, ensuring data consistency.
A situation where two or more transactions are waiting for each other to release resources, resulting in a state of inactivity.
A set of rules or guidelines for designing and organizing relational databases to minimize redundancy and improve efficiency.
The structure and components of a data warehouse system, including data sources, ETL processes, and analytical tools.
Stands for Online Analytical Processing, a category of software tools used for analyzing multidimensional data from data warehouses.
A subset of a data warehouse that is focused on a specific business function or department, providing tailored data for analysis.
Methods and algorithms used to extract patterns, trends, and insights from large datasets, including clustering, classification, and regression.
An open-source framework for distributed storage and processing of big data, based on the MapReduce programming model.
A programming model and algorithm for processing large datasets in parallel across a distributed cluster of computers.
Stands for Consistency, Availability, and Partition Tolerance, a principle that states it is impossible to achieve all three guarantees in a distributed system.
A type of NoSQL database that stores data as a collection of key-value pairs, providing fast and scalable access to data.
A type of NoSQL database that stores and retrieves data in the form of documents, typically using JSON or XML formats.
A type of NoSQL database that stores and retrieves data in columns rather than rows, enabling efficient data compression and query performance.
A type of NoSQL database that represents data as nodes and edges, allowing for efficient traversal and analysis of complex relationships.
A comparison between traditional ACID properties and the BASE properties (Basically Available, Soft state, Eventually consistent) of NoSQL databases.
The process of creating and maintaining multiple copies of data across different nodes or servers in a distributed database system.
A technique used in distributed databases to horizontally partition data across multiple servers or nodes for improved scalability and performance.
Different levels of data consistency guarantees provided by distributed database systems, such as strong consistency, eventual consistency, and causal consistency.
The property of a database system that ensures all data in the database is accurate, valid, and up-to-date at all times.
A comparison between traditional data warehouses and data lakes, which store raw and unprocessed data for flexible analysis and exploration.
Stands for Extract, Transform, Load, a process used to extract data from various sources, transform it into a consistent format, and load it into a data warehouse.
The process of identifying and correcting or removing errors, inconsistencies, and inaccuracies in data to improve its quality and reliability.
The representation of data in visual formats, such as charts, graphs, and maps, to facilitate understanding, analysis, and decision-making.
Mathematical models and techniques used to discover patterns, trends, and relationships in large datasets, including decision trees, neural networks, and association rules.
The protection of sensitive and personal data from unauthorized access, use, or disclosure, ensuring compliance with privacy regulations and laws.
Different approaches and techniques for creating backups of data, including full backups, incremental backups, and differential backups.
Methods and procedures for restoring data from backups in case of data loss, system failure, or disaster.
Real-world examples and use cases of data mining, such as customer segmentation, fraud detection, market basket analysis, and recommendation systems.
Software applications and platforms used for designing, building, and managing data warehouses, including ETL tools, OLAP servers, and reporting tools.
The structure and components of a data lake system, including data ingestion, storage, and processing layers.
A comparison between data lakes and data marts, which are subsets of data warehouses focused on specific business functions or departments.
Obstacles and issues faced during the data mining process, such as data quality, scalability, interpretability, and privacy concerns.
The overall management and control of data assets within an organization, including data policies, standards, and data stewardship.
The process of designing and structuring a data warehouse to meet the analytical needs of an organization, including dimensional modeling and star schemas.
Software applications and algorithms used for data mining tasks, such as classification, clustering, regression, and association analysis.
Measures and practices used to protect data from unauthorized access, disclosure, alteration, or destruction, ensuring its confidentiality, integrity, and availability.
Different approaches and techniques for creating backups of data and recovering it in case of data loss or system failure.
A systematic approach to discovering patterns, trends, and insights from data, including data preparation, model building, evaluation, and deployment.
The process of building and deploying a data warehouse system, including data extraction, transformation, loading, and schema design.
The assessment and validation of data mining models and results, measuring their accuracy, precision, recall, and other performance metrics.
The process of combining data from different sources and formats into a unified view, enabling comprehensive analysis and decision-making.
Techniques and optimizations used to improve the speed and efficiency of data retrieval and analysis in a data warehouse system.
Algorithms used to classify data into predefined categories or classes, such as decision trees, naive Bayes, support vector machines, and k-nearest neighbors.
Laws and regulations that govern the collection, use, storage, and sharing of personal and sensitive data, such as GDPR and CCPA.
Software and hardware solutions for creating backups of data and recovering it in case of data loss or system failure.
Algorithms used to group similar data points together based on their characteristics or attributes, such as k-means, hierarchical clustering, and DBSCAN.
The accuracy, completeness, consistency, and reliability of data, ensuring it is fit for its intended purpose and meets the needs of users.
A comparison between data warehouses, data marts, and data lakes, highlighting their differences in terms of data structure, purpose, and usage.
Algorithms used to discover relationships and associations between items or variables in large datasets, such as Apriori and FP-growth.
A set of policies, processes, and controls for managing and ensuring the quality, availability, and security of data within an organization.
Algorithms used to analyze and forecast data points over time, such as ARIMA, exponential smoothing, and recurrent neural networks.
The complete record of the origin, movement, and transformation of data throughout its lifecycle, ensuring data traceability and accountability.
Algorithms used to predict and model the relationship between variables, such as linear regression, logistic regression, and decision trees.
A technique used to protect sensitive data by replacing it with fictional or scrambled data, while preserving its format and characteristics.
Algorithms used to extract and analyze information from unstructured text data, such as natural language processing, sentiment analysis, and topic modeling.
The process of removing or modifying personally identifiable information from data to protect individual privacy and comply with data protection regulations.
Algorithms used to identify unusual or abnormal patterns in data, such as clustering, outlier detection, and support vector machines.
The process of moving data from active storage to long-term storage for historical or compliance purposes, freeing up resources in the primary database.
Algorithms used to create models that represent decisions or decisions trees based on input data, such as ID3, C4.5, and CART.
Methods and approaches used to anonymize or obfuscate sensitive data, such as tokenization, encryption, and data substitution.
Algorithms inspired by the structure and function of the human brain, used for pattern recognition, classification, and prediction tasks.
Guidelines and recommendations for implementing data masking techniques effectively and securely, ensuring data privacy and compliance.
Algorithms used to discover interesting relationships or associations between items or variables in large datasets, such as Apriori and FP-growth.
Obstacles and issues faced during the data masking process, such as preserving data utility, maintaining referential integrity, and ensuring performance.
Software applications and solutions used for implementing data masking techniques, providing features for data discovery, masking, and monitoring.
Approaches and methods for implementing data masking in databases, such as dynamic data masking, static data masking, and data scrambling.
Methods and approaches for implementing data masking in files, such as data encryption, data shuffling, and data substitution.