Enhance Your Learning with NoSQL Flash Cards for quick learning
A type of database management system that provides a flexible and scalable approach to storing and retrieving data, especially for large-scale applications and distributed environments.
NoSQL databases are schema-less, horizontally scalable, and designed for high availability and fault tolerance.
NoSQL databases support various data models, including document, key-value, column-family, graph, and object models.
The CAP theorem states that it is impossible for a distributed data system to simultaneously provide consistency, availability, and partition tolerance.
ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee reliable processing of database transactions, while BASE (Basically Available, Soft state, Eventually consistent) prioritizes availability and scalability over strict consistency.
Document databases store and retrieve data in the form of semi-structured documents, typically using JSON or XML formats.
Key-value databases store and retrieve data as a collection of key-value pairs, providing fast access to values based on their keys.
Column-family databases store and retrieve data in column families, which are containers for related data columns.
Graph databases store and retrieve data in the form of nodes, edges, and properties, allowing efficient representation and traversal of complex relationships.
Object databases store and retrieve data in the form of objects, providing support for object-oriented programming concepts and relationships.
Distributed databases store and retrieve data across multiple nodes or servers, enabling scalability, fault tolerance, and high availability.
Data replication is the process of creating and maintaining multiple copies of data across different nodes or servers for improved availability and fault tolerance.
Sharding is the process of horizontally partitioning data across multiple nodes or servers to improve scalability and performance.
Consistency models define the level of consistency that a distributed database system guarantees, such as eventual consistency or strong consistency.
Eventual consistency is a consistency model where all updates to a distributed database will eventually propagate and reach a consistent state.
Strong consistency is a consistency model where all updates to a distributed database are immediately visible and consistent across all nodes.
Concurrency control ensures that multiple concurrent transactions can access and modify data in a consistent and isolated manner.
Indexing is the process of creating data structures, such as B-trees or hash tables, to improve the speed and efficiency of data retrieval.
Querying is the process of retrieving specific data from a database using query languages, such as SQL or NoSQL-specific query languages.
Data modeling is the process of designing the structure and relationships of data in a database, ensuring efficient storage and retrieval.
Normalization is the process of organizing data in a database to eliminate redundancy and improve data integrity and consistency.
Denormalization is the process of intentionally introducing redundancy in a database to improve performance and simplify data retrieval.
ACID transactions ensure that database operations are atomic, consistent, isolated, and durable, providing reliability and data integrity.
The CAP theorem revisited acknowledges that in a distributed system, it is possible to achieve only two out of three properties: consistency, availability, and partition tolerance.
Scalability is the ability of a system to handle increasing amounts of data, traffic, or workload without sacrificing performance or availability.
Fault tolerance is the ability of a system to continue operating properly in the event of failures or errors, ensuring high availability and reliability.
Data integrity ensures that data remains accurate, consistent, and reliable throughout its lifecycle, preventing unauthorized modifications or corruption.
Data security involves protecting data from unauthorized access, use, disclosure, disruption, modification, or destruction, ensuring confidentiality, integrity, and availability.
Backup and recovery strategies involve creating copies of data and implementing processes to restore data in the event of data loss, corruption, or system failures.
Performance optimization techniques aim to improve the speed, efficiency, and responsiveness of a database system, ensuring optimal resource utilization.
Data partitioning involves dividing a database into smaller, more manageable parts called partitions or shards, allowing parallel processing and improved performance.
Data distribution refers to the process of distributing data across multiple nodes or servers in a distributed database system, ensuring load balancing and fault tolerance.
Data consistency ensures that data remains accurate and valid across different replicas or copies in a distributed database system, preventing conflicts or inconsistencies.
Data replication strategies determine how data is replicated across different nodes or servers, such as master-slave replication or multi-master replication.
Data compression techniques reduce the size of data to save storage space and improve data transfer efficiency, while maintaining data integrity and accessibility.
Data encryption involves transforming data into a secure and unreadable format using encryption algorithms, ensuring confidentiality and protection against unauthorized access.
Data backup strategies involve creating regular backups of data to protect against data loss, corruption, or accidental deletion, ensuring data recovery and business continuity.
Data recovery strategies involve restoring data from backups or other sources in the event of data loss, corruption, or system failures, ensuring data integrity and availability.
Data migration is the process of transferring data from one system or storage device to another, ensuring data integrity, compatibility, and minimal downtime.
Data warehousing involves collecting, organizing, and analyzing large volumes of data from various sources to support business intelligence and decision-making processes.
A data lake is a centralized repository that stores raw and unprocessed data from various sources, enabling flexible data exploration, analysis, and processing.
Data governance refers to the overall management and control of data assets within an organization, ensuring data quality, compliance, and security.
Data quality refers to the accuracy, completeness, consistency, and reliability of data, ensuring that data meets the requirements and expectations of users and applications.
Data privacy involves protecting sensitive and personally identifiable information (PII) from unauthorized access, use, or disclosure, ensuring compliance with privacy regulations.
Data access control involves implementing security measures to control and restrict access to data based on user roles, permissions, and authentication mechanisms.
Data auditing involves monitoring and recording data access, modifications, and activities to ensure compliance, detect unauthorized actions, and investigate security incidents.
Data archiving involves moving infrequently accessed or historical data to long-term storage for compliance, regulatory, or historical purposes, freeing up primary storage resources.
Data integration involves combining data from multiple sources or systems into a unified view, enabling data analysis, reporting, and decision-making across the organization.