## Models, Modelling, and Classification of DBMS --- ## Outline * Models * Modelleing * Classification of DBMS --- ## Models * define the logical structure of a database * includes relationships and constraints * determines how data can be stored and accessed --- ## Well-known Models * Hierarchical * Network * Relational * Object * Document --- ## Hierarchical Model * oldest formal database model * data organized into a tree-like structure * records are nodes in the tree * every record has at most 1 parent * every record may have multiple children --- ## Hierarchical Model  --- ## Hierarchichal Model - Advantages * fast and efficient retrieval * simple predictable structure * efficient storage * good performance --- ## Hierarchical Model - Disadvantages * * * * --- ## Network Model * records are nodes in a graph * records can have multiple parents * supports more types of relationships than the hierarchical model --- ## Network Model 
Mhkay
,
CC BY-SA 4.0
, via Wikimedia Commons
--- ## Relational Model * uses first-order predicate logic * data is represented in tuples * tuples grouped into relations * extremely popular model --- ## Relational Model * Relation (or Table, or File) * Tuple (or Row, or Record) * Attribute (or Column, or Field) --- ## Relational Model 
User:AutumnSnow
,
CC BY-SA 3.0
, via Wikimedia Commons
--- ## Object Model * unit of information is the object * mirrors the use of objects in object oriented programming * tightly coupled with object oriented programming language * persist data from a running object-oriented program --- ## Document Model * model for semi-structured data * documents consist of key/value pairs * documents don't have to have all the same keys but may share some * key for each document * retrieval is by document key or individual keys within a document --- ## Data Modelling * Data Abstratction * Schemas * Logical and Physical Data Independence --- ## Data Abstraction * there are multiple levels of data abstraction * view (external) * logical (or conceptual) * internal (or physical) --- ## Database Abstraction 
Database Abstraction
--- ## View or External Level * how the data looks to end users developers * typically implemented as a database view * does not necessarily correspond directly to logical design * e.g. end user wants a view combining columns from multiple tables * concern of --- ## Logical or Conceptual Level * how the data is designed to be stored * optimized for minimizing redundancy and increasing efficiency * not necessarily how the user wants it * concern of DBMS administrator or designer --- ## Internal or Physical Level * how data are actually stored in files in the file system * low-level definition * tuned for efficient access considering the constraints of the operating system * concern of DBMS implentor/vendor --- ## Schemas * formal description of database structure * External: can be multiple depending needs of individual users * Conceptual: only one created by database designer * Physical: only one created by DBMS vendor --- ## Logical and Physical Data Independence * Logical: End users and developers should be immune to a change in the conceptual schema * Physical: Conceptual schema should be immune to a change in the physical implementation of the DBMS --- ## Classification of DBMS --- ## Classification of DBMS * data model * user numbers * distribution * license * in-memory databases --- ## Data Model * Hierarchical: IMS * Network: IDS * Relational: MariaDB, PostgreSQL, Oracle, Microsoft SQL Server, MySQL * Object-oriented: ObjectStore * Document: MongoDB --- ## User Numbers * single-user: Microsoft Access * multi-user: everything else --- ## Distribution * When do we need more than one database server? --- ## Distribution * replication * sharding --- ## Distribution - Replication * used for load balancing * entire database is replicated on another computer * one server is the "Master"; others are "Slaves" * works well when there are few writes and many reads --- ## Distribution - Sharding * used when there is more data than will fit on a single computer * there are no duplicates; entire database is spread accross several computers * can also be used for load balancing * works well for any access pattern (reads vs writes) --- ## License * propietary * open-source * * --- ## Proprietary * software is protected through: copyright, patent, trade secrets * can only run the amount you pay for * e.g. Oracle, Microsoft SQL Server --- ## Open-source * software is freely copyable * can run as many copies as you like * e.g. MariaDB, PostgreSQL, MongoDB --- ## Proprietary vs. Open-source * discussion --- ## Location * on-premises * cloud --- ## Location - On-premises * own hardware * own data center --- ## Location - Cloud * someone else's hardware * someone else's data center --- ## On-premises vs. Cloud * discussion --- ## In-memory databases * entire database is in main memory * not on secondary storage * access is orders of magnitude faster * e.g. Redis, memcached --- ## Computer memory hierarchy [Computer Memory Hierarchy](https://commons.wikimedia.org/wiki/File:ComputerMemoryHierarchy.svg) --- ## DRAM vs. SDD vs. HDD * discussion