Entity types in the database. Designing a database model in terms of "entity-relationship". Relationships between entities

Household affairs

An entity is a real or abstract entity that is essential to the domain. The entity must have a name expressed by a noun in the singular

An informal way to identify entities is to look for abstractions that describe objects, processes, roles, and other concepts. A formal way to identify entities is to analyze textual descriptions of the subject area, extract nouns and select them as abstractions.

An entity instance is a specific instance of that entity. For example, employee Ivanov can be an instance of the Employee entity.

Each entity must have the following properties:

have a unique name;

have one or more attributes that either belong to an entity or are inherited through a relationship;

have one or more attributes that uniquely identify each entity instance.

Attribute - a characteristic of an entity that is significant for the subject area under consideration and is intended to identify, classify, quantify or express the state of the entity.

There are the following types of attributes:

simple - consists of one data element;

composite - consists of several data elements;

unambiguous - contains one value for one entity;

multivalued - contains several values ​​for one entity;

optional - may have an empty (undefined) value;

derived - a value derived from the value of another attribute.

A unique identifier is a set of attributes whose values ​​in the aggregate are unique for each entity instance. Removing any attribute from an identifier breaks its uniqueness. Unique identifiers are underlined in the diagram.

Each entity can have any number of relationships with other entities.

Relationships between entities

A relationship is a named association between entities that is meaningful to the domain in question.

The degree of relationship is the number of entities participating in the relationship.

Link power - the number of entity instances participating in the link.

Depending on the power value, the connection can have one of three types:

one-to-one (denoted 1:1).

one-to-many (denoted 1:N).

many-to-many (denoted by M:N).

One to one. Means that in such a relationship, an entity with one role always corresponds to no more than one entity with another role. Since the degree of connection for each entity is 1, they are connected by one line.

One-to-many. An entity with one role can be matched by any number of entities with a different role.

Many-to-many. In this case, each of the associated entities can be represented by any number of instances.

3 . Data Model Components

Entity, entity definition, sources of information about entities

The data model - a conceptual description of the subject area - is the most abstract level of database design. The data model consists of entities, attributes, domains, and relationships. Further - about each of the elements in detail.

3.1 Entities

An entity is something about which information needs to be stored in a database.

When designing databases, it is enough to describe the current situation - and most nouns and some verbs will be candidates for entities. For example: "Customers buy goods. Employees sell goods to customers. Suppliers supply goods" - customers, goods, employees and suppliers are entities. The verbs "buy" and "sell" are also entities (although they can be the same entity, different from the point of view of the buyer and seller).

When designing a database, the main source of information about entities is a conversation with the customer in order to understand his business processes. In addition, standard documents used in business processes are analyzed: forms, reports, instructions, etc. After receiving such a list, it is necessary to check it for completeness and coherence, as well as to identify duplicates - identical entities that are called by different words, and entities that are actually different, but are described by the same term.

Entities can model specific concepts (customers, goods, calls) and abstract ones (the agent is responsible for the client, the student is enrolled in the course).

Attribute.

Subject area.

Database. Definition.

DBMS. Definition.

Database. Definition.

Third normal form. Definition. Example.

A relation variable R is in third normal form if and only if the following conditions are true:

R is in second normal form.

· no non-key attribute R is in a transitive functional dependence (i.e., the dependence is not expressed through another attribute) from the potential key R.

A non-key attribute of a relation R is an attribute that does not belong to any of the candidate keys of R.

Database- this is one or more data files designed to store, modify and process large amounts of interconnected information, systematized in such a way that these materials can be found and processed using an electronic computer (computer)

Database management system (DBMS) is software that enables users to define, create, and maintain a database, and that handles database calls from end-user applications.

Database- automated information system of centralized storage and collective use of data. The data bank includes one or more databases, a database directory, a DBMS, as well as libraries of queries and application programs.

Subject area is a part of the real world to be studied in order to create a database to automate the management process.

Attribute is the smallest unit of the data structure. Each element is assigned a unique name when the database is created. It is referred to by this name during processing.

Essence- any concrete or abstract object in the subject area under consideration. Entities are the basic types of information that are stored in the database (in a relational database, each entity is assigned a table).

List the functions of a DBMS

The main functions of the DBMS:

1) Determination of the structure of the created database, its initialization and initial loading.

2) Providing users with the ability to manipulate data (selecting the necessary data, performing calculations, developing an input / output interface, visualization).

3) Ensuring the logical and physical independence of data.

4) Protection of the logical integrity of the database - the reliability of data can be violated when they are entered into the database or when illegal actions of data processing procedures that receive and enter incorrect data into the database. To increase the reliability of data in the system, so-called integrity constraints are declared.



5) Protection of physical integrity - database recovery tools (transactions).

6) Management of user permissions to access the database.

7) Synchronization of work of several users.

8) Storage environment resource management - the DBMS allocates memory resources for new data, redistributes freed memory, organizes queuing of requests to external memory, and so on.

9) Support for the activities of system personnel

The term "relational" means "relationship-based". A relational database consists of entities (tables) that have some relationship with each other. The name comes from the English word relation.
Database design consists of two main phases: logical and physical modeling.
During logical modeling, you collect requirements and develop a database model that is independent of a particular DBMS (relational database management system). It's like creating blueprints for your house. You could think over and draw everything: where the kitchen, bedrooms, living room will be. But this is all on paper and in layouts.
During physical modeling, you create a model that is optimized for a specific application and DBMS. It is this model that is implemented in practice. If we return to the house from the previous paragraph, at this stage you will have to build a house somewhere - carry logs, bricks ...

The database design process consists of the following steps:

  • collection of information;
  • definition of entities;
  • defining attributes for each entity;
  • defining relationships between entities;
  • normalization;
  • transformation to a physical model;
  • database creation.

The first 5 stages form the logical design phase and the remaining two form the physical modeling phase.

Logic phase

The logical phase consists of several stages. They are all discussed below.

Gathering Requirements

At this stage, you need to determine exactly how the database will be used and what information will be stored in it. Gather as much information as possible about what the system should and shouldn't do.

Entity definition

At this stage, you need to define the entities that the database will consist of.

An entity is an object in a database that stores data. An entity can be something real (a house, a person, an object, a place) or an abstract thing (a banking operation, a department of a company, a bus route). In the physical model, an entity is called a table.

Entities are made up of attributes (columns in a table) and records (rows in a table).

Typically, databases consist of several primary entities that are associated with a large number of subordinate entities. Core entities are called independent: they do not depend on any other entity. Subordinate entities are called dependent: in order for one of them to exist, the main table associated with it must exist.
In diagrams, entities are usually represented as rectangles. The name of the entity is indicated inside the rectangle:

Any table has the following characteristics:

  • there are no identical lines in it;
  • all columns (attributes) in the table must have different names;
  • elements within the same column have the same type (string, number, date);
  • the order of the rows in the table can be arbitrary.

At this stage, you need to identify all categories of information (entities) that will be stored in the database.

Attribute Definition

An attribute represents a property that describes an entity. Attributes are often a number, date, or text. All data stored in an attribute must be of the same type and have the same properties.
In the physical model, attributes are called columns.
After defining the entities, it is necessary to define all the attributes of these entities.
In diagrams, attributes are usually listed within the entity rectangle. In the figure you will find an example of the "Houses" database, only now some attributes are defined for the entities from this database.


Each attribute defines the data type, size, allowed values, and any other rules. These include mandatory, mutable, and uniqueness rules.
The mandatory rule determines whether an attribute is a required part of an entity. If the attribute is an optional part of the entity, then it can be NULL, otherwise not.
You must also determine if the attribute is mutable. Some attribute values ​​cannot change after the entry is created.
And finally, you need to determine if the attribute is unique. If so, then the attribute values ​​cannot be repeated.

Keys

A key is a set of attributes that uniquely identifies an entry. Keys are divided into two classes: simple and compound.
A simple key consists of only one attribute. For example, in the "Passports of the country's citizens" database, the passport number will be a simple key: after all, there are no two passports with the same number.
A composite key consists of several attributes. In the same database "Passports of citizens of the country" there can be a composite key with the following attributes:
surname, name, patronymic, date of birth. This is just an example, since this composite key, in theory, does not provide guaranteed uniqueness of the record.
There are also several types of keys, which are described below.

Possible key

A candidate key is any set of attributes that uniquely identifies an entry in a table. The candidate key can be simple or compound.
Each entity must have at least one possible key, although there may be more than one possible key. None of the primary key attributes can have a NULL value.
A candidate key is also called a surrogate key.

Primary Keys

A primary key is a set of attributes that uniquely identify a record in a table (entity). One of the possible keys becomes the primary key. In diagrams, primary keys are often shown above the main list of attributes or are highlighted with special characters. The entity in the figure has both key and regular attributes.

Alternative Keys

Any possible key that is not the primary key is called an alternate key. An entity can have multiple alternate keys.

Foreign keys

A foreign key is a collection of attributes that refer to the primary or alternate key of another entity. If the foreign key is not associated with the primary entity, then it can only contain null values. If the key is also composite, then all attributes of the foreign key must be undefined.
In diagrams, attributes that are combined into foreign keys are denoted by special symbols. The figure shows two related entities (Houses and their Owners) and the foreign keys formed by them (after all, one person can own more than one house).

Keys are logical constructs, not physical objects. Relational databases have mechanisms to store keys.

Defining Relationships Between Entities

Relational databases allow you to combine information belonging to different entities.
A relationship is a situation in which one entity refers to the primary key of a second entity. Like, for example, the entities House and Master in the previous figure.
Relationships are defined during the base design process. To do this, you should analyze the entities and identify the logical relationships that exist between them.
The relationship type determines the number of entity records associated with another entity record. Relationships are divided into three main types, which are described below.

One to one

Each entry of the first entity corresponds to only one entry from the second entity. And each record of the second entity corresponds to only one record from the first entity. For example, there are two entities: People and Birth Certificates. And one person can only have one birth certificate.

One-to-many

Each record of the first entity can correspond to several records from the second entity. However, each entry of the second entity corresponds to only one entry from the first entity. For example, there are two entities: Order and Order Item. And there can be many items in one order.

many-to-many

Each record of the first entity can correspond to several records from the second entity. However, each record of the second entity can correspond to several records from the first entity. For example, there are two entities: Author and Book. One author can write many books. But a book can have multiple authors.
According to the criterion of mandatory relations are divided into mandatory and optional.

  • A mandatory relationship means that for each entry from the first entity, there must be related entries in the second entity.
  • An optional relationship means that a record from the first entity may not have a record in the second entity.

Normalization

Normalization is the process of removing redundant data from a database. Each data element must be stored in the database in one and only one instance. There are five common forms of normalization. As a rule, the database is reduced to the third normal form.
During the normalization process, certain actions are performed to remove redundant data. Normalization improves performance, speeds up sorting and index building, reduces the number of indexes per entity, and speeds up insert and update operations.
A normalized database is usually more flexible. When modifying queries or persisted data, a normalized database typically requires fewer changes, and changes have fewer consequences.

First normal form

To convert an entity to first normal form, you must eliminate duplicate groups of values ​​and ensure that each attribute contains only one value, lists of values ​​are not allowed.
In other words, each attribute in an entity should only be stored in one instance.
For example, in the figure, the House entity is not normalized. It contains several attributes for storing data about the owners of the house (the House entity does not correspond to the first normal form).

To bring the House entity to the first normal form, it is necessary to remove the repeating groups of values, that is, remove the Owner 1-3 attributes, placing them in a separate entity. Result (Entity House reduced to first normal form):

Second normal form

A table in second normal form contains only the data that applies to it. Values ​​of non-key entity attributes depend on the primary key. More precisely, attributes depend on the primary key, on the entire primary key, and only on the primary key.
Entities must be in first normal form to conform to second normal form.
For example, the entity House in the figure has an attribute Price per liter of gasoline, which has nothing to do with houses. This attribute is removed (or you can move it to another entity). And also we move the Mayor attribute to a separate entity - this attribute depends on the city where the house is located, and not on the house.
The figure shows the essence House in the second normal form (the Essence House reduced to the second normal form).

third normal form

Third normal form excludes attributes that do not depend on the entire key. Any entity that is in third normal form is also in second normal form. This is the most common form of a database.
In third normal form, every attribute depends on the key, on the whole key, and on nothing but the key.
For example, the House Owner entity in the figure has a Zodiac sign attribute that depends on the date of birth of the owner of the house, and not on his name (which is the key).
To cast the entity Owner of the house, you need to create the entity Signs of the Zodiac and transfer the attribute Sign of the Zodiac there (Entity Owner of the house, reduced to the third normal form):

Restrictions

Constraints are the rules enforced by the database management system. Constraints define the set of values ​​that can be entered in a column or columns.
For example, you do not want the order amount in your very cool store to be less than 500 rubles. You simply set a limit on the Order Amount column.

Stored procedures

Stored procedures are precompiled procedures stored in a database. Stored procedures can be used to define business rules and can be used to perform more complex calculations than constraints alone.
Stored procedures can contain program flow logic as well as database queries. They can take parameters and return results as tables or single values.
Stored procedures are just like regular procedures or functions in any program.

NOTE
Stored procedures reside in the database and run on the database server. They are generally faster than SQL statements because they are stored in compiled form.

Data integrity

By organizing the data into tables and defining the relationships between them, we can assume that a model has been created that correctly reflects the business environment. Now we need to ensure that the data entered into the database gives a correct idea of ​​the state of the matter. In other words, you need to enforce business rules and maintain the integrity of the database.
For example, your company is engaged in the delivery of books. You are unlikely to accept an order from an unknown client, because then you will not even be able to deliver the order. Hence the business rule: orders are accepted only from customers whose information is in the database.
The correctness of data in relational databases is ensured by a set of rules. Data integrity rules fall into four categories.

  • Entity Integrity- each entity record must have a unique identifier and contain data. After all, you need to somehow distinguish between all these records in the database.
  • Attribute Integrity- each attribute accepts only valid values. For example, the purchase amount can definitely not be less than zero.
  • Referential Integrity- a set of rules that ensure the logical consistency of primary and foreign keys when inserting, updating and deleting records. Referential integrity ensures that for every foreign key there is a corresponding primary key. Let's take the previous example with the entities Home Owner and House. Let's say you are Vasya Ivanov and own a house. You changed your last name to Sidorov and made the appropriate changes to the House owner entity. Definitely you would like your house to continue to be yours under your new name, and not belong to a certain Vasya Ivanov, who no longer exists.
  • Custom Integrity Rules- any integrity rules that do not belong to any of the listed categories.

triggers

Trigger is an analogue of a stored procedure, which is called automatically when the data in the table changes.
Triggers are a powerful mechanism for maintaining database integrity. Triggers are called before or after data changes in the table.
With the help of triggers, you can not only undo these changes, but also change the data in any other table.
For example, you are creating an Internet forum and you want to make sure that the forum list shows the latest forum post. Of course, you can take a message from the Forum Posts entity, but this will increase the complexity of your request and its execution time. It's easier to add a trigger to the Forum Posts entity that records the last post added to the Forums entity, in the Last Post attribute. This will greatly simplify the query.

Business rules

Business rules define the restrictions placed on the data according to the requirements of the business (those for whom you are creating the base). Business rules may consist of a set of steps required to complete a specific task, or they may simply be checks that verify that the data entered is correct. Business rules may include data integrity rules. Unlike other rules, their main purpose is to ensure that business transactions are conducted correctly.
For example, in the Very Tough Guys company, it may be customary that only white, blue, and black cars are purchased for official use.
The business rule for the Vehicle Color attribute of the Company Vehicles entity would then be that the vehicle can only be white, blue, or black.
Most DBMSs provide the means to:

  • to specify default values;
  • to check the data before entering it into the database;
  • to maintain relationships between tables;
  • to ensure the uniqueness of values;
  • for storing stored procedures directly in the database.

All of these features can be used to implement business rules in a database.

Physical model

The next step, after creating the logical model, is to build the physical model. The physical model is the practical implementation of the database. The physical model defines all the objects that you have to implement.
When moving from a logical model to a physical entity, they are converted to tables, and attributes to columns.
Relationships between entities can be converted to tables or left as foreign keys.
Primary keys are converted to primary key constraints. Possible keys are in uniqueness constraints.

Denormalization

Denormalization- this is a deliberate change in the structure of the base that violates the rules of normal forms. This is usually done to improve database performance.
Theoretically, one should always strive for a fully normalized base, but in practice, a fully normalized base almost always means a performance drop. Over-normalizing a database can result in multiple tables being accessed each time data is retrieved. Typically, four tables or fewer must participate in a query.
Standard denormalization techniques are: combining several tables into one, storing the same attributes in several tables, and storing summary or calculated data in a table.

A couple of years ago, among my other activities, there were online lessons on the basics of building a logical database structure and the SQL language. I am not doing lessons at the moment, but the recordings themselves remained, so I decided to post them, why should the good be wasted? 🙂

Today we will talk about the entity-relationship model, or entity-relationship model.

Theory

The Entity-Relationship model or ER model is a high-level conceptual data model that was developed to simplify the task of designing database structures.

This model is a set of concepts that describe the database structure as a set of entities, attributes and relationships. The main goal of developing such a data model is to create a user experience of data and to agree on a large number of technical aspects related to database design. It should be especially noted that the conceptual data model does not depend on the specific DBMS or hardware platform that is used to implement the database.

The purpose of the "entity-relationship" diagrams is to create an accurate and complete representation of the real subject area (SbD), which is used later as a source of information for building a database of automated information processing systems (DB ASOI).

This diagram or conceptual model of the ObD must meet the following requirements:

  • Ensure adequate display of the SbA;
  • Present in a language that is understandable to both future users of the ASOI and database developers;
  • Contain information about the ObD sufficient for further database design (development of logical and physical models);
  • Guarantee unambiguous interpretation or interpretation of the ObA model.

The main concepts of this model are the concepts entity, attribute, and relationship.

ESSENCE is a set of real world objects with the same properties. An entity is characterized by independent existence and can be an object with a physical (or real) existence or an object with a conceptual (or abstract) existence.

The entity is the main content of the phenomenon or process (transaction or request) about which it is necessary to collect information, and is the nodal point of information collection. An entity refers to a set of homogeneous objects or things. Each entity is identified by a name and a list of properties (attributes). An entity can be a person, a place, a thing, etc., information about which must be stored in the database.

Practice

EXAMPLE. Subject area " Booking tickets at the cinema". Movies are shown in the cinema, tickets for which can be bought on the day of the show or booked in advance. The database contains information about all film screenings in a given cinema, including old ones. Each film screening has its own cost, i.е. tickets for the same movie but at different times may differ in price. A film screening consists of a Film, information about which is also stored in the database.

For Pro " Booking tickets at the cinema” entities will be the following concepts:

Film screening

Movies

Viewer

Ticket

Booking

Price

Graphically, entities in entity-relationship diagrams are represented as rectangles:

ATTRIBUTE it is the means by which the properties of an entity or relation are defined. An attribute is a named characteristic of an entity. The attribute name must be unique for a particular entity, but may be the same for different entities.

The specific set of attributes for an entity is determined by the tasks in which they are used. For example, the entities of the ObA “Ordering tickets at the cinema” can be described using the following set of attributes:

Film screening(Film Screening Number, Film Number, Screening Date, Cost Number);

Movie(Movie number, Title, Duration, Brief description);

Viewer(Spectator number, full name, date of birth);

Ticket(Spectator number, Film screening number, Ticket price);

Booking(Auditor number, Screening number, booking date);

Price(Cost number, Screening number, cost).

Graphically, the attributes of an entity are represented as callouts that list a list of attribute names. For example:

Bold italics and underlining denote primary keys, an attribute of an entity that uniquely characterizes it. Underlining denotes foreign keys - attributes that uniquely characterize the entities to which they refer.

CONNECTION is a relationship between instances of two (or more) different entities. The relationship mechanism is used to define relationships between entities in an ObA. In addition, there are relationships between the attributes of a separate entity (to be considered when building logical models).

Each link is given a name that should describe its function. Relationships have such characteristics as the name of the connection, the cardinality index, the degree of participation, the degree of connection, the time of existence of the connection, and others.

The name of the relationship should carry a certain meaning in order to make it easier to understand how the entities are related. For example, the relationship between the entities Spectator and Ticket can be defined as "Buy".

A rhombus is used to graphically represent relationships in entity-relationship diagrams. Inside the rhombus, the name of the connection is defined, and the entities participating in this connection are connected using lines.

The cardinality indicator of the relationship (characteristic of uniqueness) indicates the degree of relationship between entities and describes the number of possible relationships for each of the participating entities:

  • one-to-one (1:1);
  • one-to-many (1:N);
  • many-to-many (N:M).