Module 1: Introduction & Entity Relationship (ER) Model
- Introduction to Database Management Systems (DBMS): A DBMS is software that interacts with users, applications, and the database itself to capture and analyze data. It allows users to create, read, update and delete data in a database.
- Characteristics of Database Systems: Some key characteristics include support for ACID properties (Atomicity, Consistency, Isolation, Durability), data abstraction, support for multiple views of data, and security mechanisms.
- Database Users: There are several types of database users including database administrators, database designers, end-users, and application programmers.
- Types of Data:
- Structured Data: Highly-organized and formatted in a way so it’s easily searchable in relational databases.
- Semi-structured Data: A type of data where both raw and organized data are present. It is not as raw as unstructured data and not as organized as structured data.
- Unstructured Data: Information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy but may contain data such as dates, numbers, and facts.
- Data Models and Schema: A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. A schema, in a relational database, is a collection of database objects, including tables.
- Three Schema Architecture: The three-schema architecture separates the conceptual, external, and internal schemas, promoting data abstraction.
- Database Languages and Architectures: Database languages are used to read, update and store data in a database. There are several types such as DDL, DML, DCL and TCL. Database architectures define how data is stored, processed and accessed.
- Entity Relationship (ER) Model: The ER model defines the conceptual view of a database. It works around real-world entities and the associations among them.
- Basic Concepts: Includes entities, attributes, relationships, constraints, etc.
- Entity Sets & Attributes: An entity set is a collection of similar entities. Attributes are the properties that define the entity.
- Relationships and Constraints: Relationships are the associations between entities. Constraints are the rules enforced on entities.
- Cardinality and Participation: Cardinality defines the number of entities in one entity set, which can be associated with the number of entities of other sets via a relationship set. Participation constraint determines the minimum number of relationship instances that an entity can participate in.
- Weak Entities: A weak entity is an entity that cannot exist in a database unless another type of entity also exists in that database. Weak entity types have a primary key that is partially or totally derived from the parent entity in the relationship.
- Relationships of Degree 3: Also known as ternary relationships, they involve three entities.
Module 2: Relational Model
- Structure of Relational Databases: A relational database is a type of database that stores and provides access to data points that are related to one another. It consists of a set of tables with data that fits into a predefined category.
- Integrity Constraints: Integrity constraints ensure that changes made to the database by authorized users do not result in a loss of data consistency. Thus, integrity constraints prevent the entry of inconsistent data into the database.
- Synthesizing ER Diagram to Relational Schema: This process involves converting entities into tables, converting relationships into foreign keys, etc.
- Relational Algebra: It is a procedural query language, which takes instances of relations as input and yields instances of relations as output. It uses operators to perform queries.
- Select, Project, Cartesian Product Operations: These are basic operations in relational algebra. Select operation selects tuples that satisfy a given predicate. Project operation returns its argument relation, removing duplicate tuples and sorting the result. Cartesian product operation combines information from any two relations into a single relation.
- Joins: Equi-join, Natural Join: Joins combine rows from two or more tables based on a related column between them. Equi-join is a join with a join predicate containing only equality comparisons. Natural join is a type of equi-join where the join predicate arises implicitly by comparing all columns in both tables that have the same column names.
- Structured Query Language (SQL): SQL is a standard language for managing and manipulating databases.
- Data Definition Language (DDL): DDL is a subset of SQL, used to define and manage all the objects in an SQL database.
- Table Definitions and Operations: In SQL, tables are used to store data. Table definitions include defining a table and its columns and data types. Table operations include the SQL commands like SELECT, INSERT, UPDATE, DELETE, etc.
Module 3: SQL DML (Data Manipulation Language), Physical Data Organization
- SQL DML: DML is a subset of SQL, used to add, update and delete data in a database.
- Queries on Single and Multiple Tables: SQL queries can be used to retrieve data from a single table or multiple tables.
- Nested Queries: A nested query is a query within another SQL query which retrieves data from one table based on data from another table.
- Aggregation and Grouping: SQL provides several functions, like COUNT, SUM, AVG, MAX, MIN, that can be used to perform an operation on a data set. The GROUP BY statement groups rows that have the same values in specified columns into aggregated data.
- Views, Assertions, Triggers: A view is a virtual table based on the result-set of an SQL statement. An assertion is a predicate expressing a condition that we wish the database to always satisfy. Triggers are stored programs, which are automatically executed or fired when some events occur.
- SQL Data Types: Each column in a database table is required to have a name and a data type. SQL offers several data types to choose from, depending on the need to store numbers, characters, or boolean values.
- Physical Data Organization: It refers to the technique of designing efficient structures for storing data files on the disks.
- Review of Terms: Includes terms like records, fields, tables, etc.
- Heap Files, Indexing: A heap file allows record retrieval by examining each record for matching criteria. Indexing is a data structure technique to efficiently retrieve records from the database files based on some attributes on which the indexing has been done.
- B-Trees & B±Trees: B-Tree and B+ Tree : They are the tree data structures used for external storage systems. B-Tree is used when the data is stored in the disk it reduces the number of input/output operations. B+ Tree is used when the data is stored in the main memory of the computer system.
- Extendible Hashing: It is a type of hash system which treats a hash as a bit string, and uses a trie for bucket lookup.
- Indexing on Multiple Keys: It is a way of sorting data using multiple keys.
Module 4: Normalization
- Introduction to Normalization: Normalization is a method to remove all these anomalies and bring the database to a consistent state.
- Functional Dependency: Functional dependency is a constraint between two sets of attributes in a relation from a database.
- Armstrong’s Axioms: Armstrong’s axioms are a set of axioms (or, more precisely, inference rules) used to infer all the functional dependencies on a relational database.
- First, Second, Third, and Boyce Codd Normal Forms: These are rules to define the minimum requirements for a relational database design. They are progressive, meaning that the first normal form must be met before moving to the second, the second before the third, and so on.
- Lossless Join and Dependency Preserving Decomposition: Lossless join property is a fundamental property that guarantees that the spurious tuple generation does not occur with respect to relation instances. Dependency preserving decomposition is a decomposition of the database schema D such that for each functional dependency X -> Y specified on relation schema R in D, where X and Y are both subsets of R, there is a relation schema Q in the decomposition such that (X U Y) is a subset of Q.
Module 5: Transactions, Concurrency and Recovery, Recent Topics
- Transaction Processing Concepts: A transaction is a single logical unit of work that accesses and possibly modifies the contents of a database.
- Concurrency Control: Concurrency control is a database management systems (DBMS) concept that is used to address conflicts with the simultaneous accessing or altering of data that can occur with a multi-user system.
- Recovery: Database recovery is the process of restoring the database back to the correct state.
- NoSQL Databases: NoSQL databases are purpose-built for specific data models and have flexible schemas for building modern applications.
- Key-value DB: Key-value databases are a type of NoSQL database where each value is associated with a unique key.
- Document DB: Document databases make it easier for developers to store and query data in a database by using the same document-model format they use in their application code.
- Column-Family DB: A column-family database is a type of NoSQL database that uses a column-family data model. This model is a type of wide-column store, which uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table.
- Graph DB: A graph database is a type of NoSQL database that uses graph theory to store, map and query relationships.
Each module provides a structured approach to understanding different aspects of database management systems, progressing from