Minggu, 26 April 2009

Database Normalization

Database Normalization

Database Design Process (review)
* Gather user needs / business
* Develop an ER Model based on user / business
* Convert E-R model to the set of relations (tables)
* Normalisasikan relations, to remove anomalies
* Implemented to create a database with a table for each relationship that have to normalized

Database Normalization

1. Normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics—insertion, update, and deletion anomalies—that could lead to a loss of data integrity.(from http://en.wikipedia.org/wiki/Database_normalization)

2. Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. (from http://databases.about.com/od/specificproducts/a/normalization.htm)

* Normalization process is the establishment of the database structure so that most of the ambiguity can be removed.
* Normalization stage, starting from the most mild (1NF) to most stringent (5NF)
* Usually only up to the level of 3NF or BCNF because already sufficient to generate the table-table good quality

Why normalization is done?
* Optimizing table structures
* Increase speed
* The income data is the same
* More efficient in the use of storage media
* Reduce redundancy
* Avoid anomalies (insertion anomalies, deletion anomalies, update anomalies).
* Improved data integrity

-> A table saying good (efficient) or if the normal 3 to meet the following criteria:
- If there is decomposition (decomposition) table, it must be guaranteed safe dekomposisinya (Lossless-Join Decomposition). That is, after the table is described / in the decomposition into a new tables, the tables can generate a new table with the same exact.
- Maintain dependence on the functional changes in data (Dependency preservation).
- Does not violate Boyce-Code Normal Form (BCNF)
-> If the three criteria (BCNF) can not be met, then at least the table does not violate the Normal Form of the third stage (3rd Normal Form / 3NF).

Functional Dependency ~ 1
- Functional Dependency describes a relationship attributes in relation
- An attribute said functionally dependant on the other, if we use the value attribute to determine the value of the other attributes.
- Symbols ( -> ) is used to represent the functional dependency.This symbol is read : determine the functional

Functional Depedency ~ 2
- Notation : A -> B
A and B are attributes of a table. A means of determining the functional B or B depends on A, if and only if there are 2 rows of data with the same value of A, then B is also the same value
- Notation : A != B or A X-> B
It is the opposite of the previous notation

EXAMPLE:

Functional Dependency:
- NRP -> Nama
- Mata_Kuliah, NRP -> Nilai
Non Functional Dependency:
- Mata_Kuliah -> NRP
- NRP -> Nilai

Functional Depedency ~ 4
Functional Dependency of the table value :
- Nrp -> Name
Because for each value Nrp the same, then the value of the same name
- (Mata_kuliah, NRP) -> Value
Because the value of attributes depending on the NRP and Mata_kuliah together. In another sense Mata_kuliah for the NRP and the same, they also rated the same, because Mata_kuliah and the NRP is a key (is unique).
- Mata_kuliah -//-> NRP
- NRP -//-> Value

Normal Form
1. The normal forms (abbrev. NF) of relational database theory provide criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies. (from http://en.wikipedia.org/wiki/Database_normalization)

* First Normal Form (1NF)
A table on the form said to be normal I if it's did not reside in the unnormalized form of a table, where there is a kind of field multiplication and field that allows a null (empty)

Is not allowed there:
- Many attributes of value (Multivalued attributes).
- Attributes composite or a combination of both
So:
Price is the domain attribute must be atomic rates

Eg Student Data as follows:

or

the tables above does not meet the requirements 1NF

both tables are decomposition into:
Student Table:

Table Hobbies:


* Second Normal Form (2NF) ~ 1 - Normal form 2NF met in a table if it meets the form of 1NF, and all the attributes than the primary key, have a full Functional Dependency on primary key
- A table does not meet 2NF, if there are attributes that it's Functional Dependency are only partial (only depending on the part of the primary key)
- If there are attributes that have no dependence on the primary key, then the attributes must be moved or removed


* Second Normal Form (2NF) ~ 2 - Functional dependency X -> Y is full if it is said to delete an attribute A from X means that Y is no longer dependent functional.
- Functional dependency X -> Y said if deleting a partial attribute A from X means that Y is functionally dependent.
- Relation scheme R in the form 2NF if every non-primary key attribute A e R depend on the full functional primary key R.

EXAMPLE:

The following table meet 1NF, but not include 2NF



the tables above does not meet 2NF, because (NIM, KodeMk) is regarded as the primary key:
{ NIM, KodeMk } -> NamaMhs
{ NIM, KodeMk } -> Address
{ NIM, KodeMk } -> Matakuliah
{ NIM, KodeMk } -> SKS
{ NIM, KodeMk } -> NilaiHuruf
Table in the decomposition needs to be some of the table is eligible 2NF

their functional dependency as follows:
- {NIM, KodeMk} -> NilaiHuruf (fd1)
- NIM -> {NamaMhs, Address} (fd2)
- KodeMk -> {Matakuliah, SKS} (fd3)
So that:
- fd1 (NIM, KodeMk, NilaiHuruf) -> Value Table
- fd2 (NIM, NamaMhs, Address) -> Table Student
- fd3 (KodeMk, Matakuliah, SKS) -> Table MataKuliah

Third Normal Form (3NF) ~ 1 Normal form 3NF fulfilled if the form meets 2NF, and if there are no non-primary key attribute that has a dependence on non-primary key attributes of the other (transitive dependencies)

EXAMPLE:
The table following students eligible 2NF, but does not meet 3NF


Because there are non-primary key attribute (ie, City and Provincial), which has a dependence on non-primary key attributes of the other (ie KodePos):
KodePos -> { City, Province }

So that the table in the decomposition needs to be:
- Student (NIM, NamaMhs, Road, KodePos)
- KodePos (KodePos, Province, City)

Boyce-Codd Normal Form (BNCF) Boyce-Codd Normal Form constraint has a stronger form of the Normal third. To be BNCF, relations must be in the form of First Normal form and forced each of the attributes depends on the function in the super key attributes.

In the example below there is a relationship of seminar, is the Primary Key NPM + Seminar.
Students may take one or two seminars. Each seminar requires 2 each of the students and led by one of the 2 seminar. Each leader can only take one seminar course. NPM in this example and show a Seminar Pembimbing


Relations Seminar is a form of Third Normal, but not BCNF, because Seminar Code still depends the function on the Pembimbing, if any Pembimbing can only teach a seminar. Depending on the seminar is not a super key attributes such as required by BCNF. So Seminar relations must be parsed into two tables:


Fourth Normal Form (4NF) and Fifth Normal Form (5NF)
- Relations in the fourth normal form (4NF) if the relation in BCNF and does not contain a lot of dependence values. To remove the dependency of many values from a relation, we divide the relationship into two new relations. Each relation contains two attributes that have a lot of relationship value.
- Relations in fifth normal form (5NF) deal with the property called the join without any loss of information (lossless join). Fifth normal form (also called the 5 NF PJNF (projection join normal form). The case is very rare and appear difficult to detect in practice.

Refrences
1. Agus Sanjaya ER, S.Kom, M.Kom ,presentation slide : Normalization
2. http://en.wikipedia.org/wiki/Database_normalization

Jumat, 17 April 2009

Database and Entity Relationship Diagram

Database and Entity Relationship Diagram


The Definition of database
collection of data which stored in magnetic disk, optical disk or other secondary storage. Or in other word, collection of data which is linked one to other of an enterprise.example:

* Manufactur company -> production planning data, actual production data, data ordering material
* Hospital -> data of patient, doctor, nurse, etc
Database Management System
Database Management System is computer software that manages databases. DBMSes may use any of a variety of database models, such as the network model or relational model. Collection a database with software application in based of database and this software application is use for accessing and maintaining the database. The first aim of DBMS is to preparing an easy and efficient environment for using, collecting, and storing data and information.
Bit, Byte, Field
* Bit : is the smaller part of data, which contain of value 1 or 0
* Byte : collection of same kind of bit
* Filed : collection of same kind of byte, which in database is called "attribute"

** GAMBAR 1.1**

Attribute or field
Attribute or field is a characteristic from entity, which preparing detailed explanation about that's entity and a relation could have atribute too.
Example :
- car : number plate, colour, type, etc
- university student : name, ID, Faculty, etc

Type of Attribute
* single vs multivalue
single :is only can fill by one value
multivalue :is can fill with more one value with the same of kind

* atomic vs composition
atomic : can not divided into smaller attribute
composition : boundle from some smaller attribute

* derived attribute
derived attribute is an attribute which it's value can be produced from other attribute

* null value attribute
null value attribute is an attribute with no value for a record

* mandatory value attribute
mandatory value attribute is an attribute which must have value

Record or Tuple
Record or Tuple is a line of data in a relationship and consist of collection attribut, where the attribute is related one to other to give detailed information of the entity or relation

Entity or File
Entity is a collection of same kind of record and having same element, same attribute however different in their data value
type of Entity:
in processing application, file can be categorized likes:
- mains file
- transaction file
- report file
- history file
- protector file
- work file

Domain
Domain is collection of values which enabled to stay in one or more attribute. Every attribute in a relational database is defined like a domain

Key of element data
Key is the element of record which used to find the record when accessing or can also used to identify every entity / record / row

Type of key :
* Super key
Super keyis one or more attribute from a table which can be used for identifying entity / record from table uniquely

* Candidate key
Candidate key is super key with minimum attribute. Candidate key may not contain attribute of other table so that the candidate is super key, but not the contrary

* Primary key
Primary key is one of the attribute of candidate key. Primary key can be selected / determined with 3 rule:
- the key is more natural for use as reference
- the key is more simple
- the key is unique

* Alternate key
Alternate key is attribute of candidate key which is not chosen become primary key

* Foreign key
Foreign key is any attribute subjecting to primary key at other tables. foreign key will be happend in a relation which have one to many cardinality or many to many cardinality. Foreign key usually put at the table which direct to many cardinality.

* External key
External key is lexical attribute or collection of lexical attribute which values always identify one object instance
ERD
ERD is a model of a network that uses word order is stored in the system of abstract. And the different between DFD and ERD is:
- DFD is a model of network functions that will be implemented by the system
- ERD is a model that emphasizes the network data on the structure and relationship data

Elements in ERD
* Entity or file
In the ER Diagram, Entity is described with the form of a rectangle. Entity is something that exists in the real system and the abstract where the data stored or where there are data.

* Relationship
In the ER diagram, the relationship can be described with a rhomb. Relationship is a natural relationship that occurs between entities. In general, the name given to the verb base making it easier to do the reading the relation. Relationship degree is the number of entities which participating in a relationship. Degree is unit which is often used in the ERD.

Degree of relationship
- Unary relationship
Unary relationship is the relationship model between the entity originating from the same entity set.
- Binary relationship
Binary relationship is the relationship model between 2 entities
- Ternary relationship
Ternary relationship is a relationship between the instance of 3 types of entities are unilateral

* Attribute
Attribute is the nature or characteristics of each entity and relationship

* Cardinality
Cardinality indicates the maximum number of tables that can be relation with the entity on the other entity
Type of Cardinality :

* one to one
Level one to one relationship with the one stated in the entity's first event, only had one relationship with one incident in which the two entities and vice versa.

Gambar 1.2

* one to many or many to one
Level one to many relationship is the same as the one to many depending on the direction from which the relationship have to seen.For an incident on the first entity can have many relationships with the incident on the second entity, if the one incident in which two entities can only have a relationship with one incident on the first entity.

Gambar 1.3

* many to many
if any incident occurs in an entity has many relationships with other entities in the incident.

Gambar 1.4

Notation in ER-Diagram
The notation symbol of ER-Diagram is:
- Rectangle represent the collective entity
- Circle represent the attributes
- Rhomb is represent collective relationships
- Line as the link of relationships between the entity and the collective entity with collection of attribute

Sabtu, 04 April 2009

Data Flow Diagram

Data flow diagram

A data-flow diagram (DFD) is a graphical representation of the "flow" of data through an information system. It differs from the flowchart as it shows the data flow instead of the control flow of the program.

DFD usually use to figured a system division into smaller module and can make user to easy that insufficiently understand computer area for understands system who will be worked. A data-flow diagram can be used for the visualization of data processing (structured design). Expresed system into smaller modul and help user, which less comprehending in computer science to understand with a system, which to work it

Context Diagram
Context Diagram is :
-> include one process and expressed scope from a system
-> it's the highest level from DFD which expresed all input into the system and output from system
-> sytem is bounded by the boundary (it's shown by the dashed line)
-> there isn't any storage

Zero Diagram
-> showing the process of DFD
-> giving totally opinion about the system which is handle it, showing the prime funtion or process, data flow and eksternal entity
-> in this level, are possible to data storage
-> for process which isn't eksplained in next level, enhanced symbol "*" or "P" at the end of the process number
-> balancing input and output between zero diagram and context diagram have to maintained

Detailed Diagram
-> is the diagram which explain what process are in the zero diagram or higer level
-> nubering level on DFD:
-> in the same level, is better if there aren't more than 7 process and maximum 9 process, if more then it must be dekomposize

Process Specification
-> every process in DFD must have specification
-> in top level, used method to express process may use descriptive sentence
-> in the detailed process that is the most under process (functional primitive) need more structural specification
-> the process specification will become guidance to programmer to coding
-> used method to process specification:
-> description process in story form
-> decision table
-> decision tree

Outer Unity
-> something is outside system, but it giving input into system or giving data form system
-> symbolized with box notation
-> external entity isn't part of system
-> naming :
-> terminal's name is noun form
-> terminal's name must be diferrent, except the object is same

Data Flow
Data flow is a Place for information stream. Data flow symbol with straight line to connective component of system. Data flow direction is indicated with arrows and lines give the name on the flow of data that flows. Flow data flow between processes, data storage and data flow indicates that the form of data input for the system.
Guide of Data Flow name:
->Name of data flow which consist of some word flow attributed with continued line.
->There may not be any data flow which its same name and gift of name have to express its contents

->Data stream which consist of some element can be expressed with element grup

->Avoid usage of word ‘ data’ and ‘ information’ to give the name at data flow

->As possible the name of data flow written is complete

Other rule:

->Data flow name which turn in at a process may not same by the name of secretary data flow of that process.
->Data flow into or out of data storage doesn't need to give a name if:
->The flow of data simple and easy to understand
->Describes the data flow of all data items
->There may not any data flow of terminal to data storage or on the contrary because terminal isn't part of system, terminal relation with data storage have to pass process.

Process
-> process is what is done by system
-> process could processing data or flowing data input to flowing data output
-> function of process is transforming one or several data input into one or several data output which according to spcification is needed
-> every process has one or several input and produce one or several output
-> process is also called bubble

guidance for naming process :
-> name of process include verb and noun which expressing the function of that process
-> don't use word "process" to a part of naming bubble
-> there aren't some process which same name
-> process have to give numbered. order of number must possible following the flow or order process, but an ordered number of process is not absolut expressing the ordered process chronologically

Data Storage
-> data storage is a place to saving data which existing in system
-> symbolized with a set of a row lines or two lines whith one side opened
-> process can taking data from or giving data to database
-> guidance for giving name :
-> name must expressed data storage
-> if the name include more than one word, then must given continued sign

DFD Symbol


Dictionary Data
-> function to help perpetrator of system to interpreting application detailly and organized all data element which used by system until user and system analsyt have same concept about input, output, storage and process.
-> in analysis phase, dictionary data is used as communication tolls between system analyst and user
-> in planning phase of the system, dictionary data is use for planning input, report and database
-> data flow in DAD is global

Dictionary data is substance of case bellow :
-> name of data flow : have to be noted in order that reader which need next explanation about a flown data can be search easily
-> alias : other name of data, could writen if it has
-> data form : used to clasified dictionary data to it's uses when planning the system
-> data flow : showing from where data is coming and where data is going
-> explanation :giving explanation about the meaning of data flow

Balancing in DFD
-> data flow which come into and out from a process must same with data flow which come into and out from detailed process in it's lower level
-> name of the data flow which come into and out from process must have same name with name of data flow which come into and out from it's detailed process
-> number and name of entity outside the process must same with name and outside entity from it's detailed process
-> The issues that must be considered in the DFD which have more than one level
-> Must have balance input and output between one level and next level.
-> Balance between level 0 and level 1 seen in input or output from data flow go or from terminal in level 0, while balance between level 1 and level 2 seen in input or output from data flow go or from process concerned.
-> Data flow name, data storage and terminal each level must same if the object same.

Prohibition order in DFD
-> Data flow may not from external entity direct to external entity without through a process.
-> Data flow may not from data storage direct to external entity without through a process.
-> Data flow may not from data storage direct to other data storage without through a process.
-> Data flow from one process direct to other process without through a saving data should or be avoided as much as possible.

Reference:

Agus Sanjaya ER, S.Kom, M.Kom ,presentation slide : Diagram Aliran Data