The first step in creating a database is to design it. You design a database before you ever put finger to keyboard to create that database. Planning is perhaps the most important step. It’s very painful to discover after you build the database and put it in service that it doesn’t contain all the data or provide the relationships between data that you need, so in this chapter we give you some tips for designing a database that will work well for you.
After completing your database design, you’re ready to build that database, and we tell you how to do that too, later in the chapter. You create the database and its tables according to the design you developed. When it’s built, you have a useful, empty database, waiting for you to fill it with data. You can then read about adding and retrieving data in Chapter 4 of this minibook.
Designing the database includes identifying the data that you need and organizing the data in the way that the database software requires. As you plan your database design, you’ll also need to decide on a primary key for each table and how tables relate to one another. You should also consider what types of data you will store in your database.
To design a database, you first must identify what information belongs in it. The database must contain the data needed for the website to perform its purpose.
In many cases, your application might include a task that collects information from the user. For instance, customers who buy products from a website must provide their address, phone number, credit card information, and other data in order to complete the order. The information must be saved at least until the order is filled. Often, the website retains the customer information to facilitate future orders so the customer doesn’t need to retype the information when placing the next order. The information also provides marketing opportunities to the business operating the website, such as sending marketing offers or newsletters to customers.
You have to balance your urge to collect all the potentially useful information you can think of against your users’ reluctance to give out personal information — as well as their avoidance of forms that look too timeconsuming.
One compromise is to ask for some optional information. Users who don’t mind can enter that information, but users who object can leave that portion of the form blank. You can also offer an incentive: The longer the form, the stronger the incentive you need to motivate the user to fill out the form. Here’s an example: A user might be willing to fill out a short form to enter a sweepstakes that offers two sneak-preview movie tickets as a prize, but if the form is long and complicated, the prize needs to be more valuable, such as a chance to win a trip to Hollywood.
Take the time to develop a comprehensive list of the information you need to store in your database. Although you can change and add information to your database after you develop it, including the information from the beginning is easier, and you might be able to avoid the extra work of changing the database later. Also, if you add information to the database later — after that database is in use — the first users in the database have incomplete information. For example, if you change your form so that it now asks for the user’s age, you don’t have the age for the people who already filled out the form and are already in the database.
MySQL is a Relational Database Management System (RDBMS), which means the data is organized into tables. (See Chapter 1 in this minibook for more on how MySQL works.)
|✦ Customers||✦ Books|
|✦ Products||✦ Computers|
|✦ Companies||✦ Shapes|
|✦ Animals||✦ Documents|
|✦ Cities||✦ Projects|
|✦ Rooms||✦ Weeks|
In database talk, an object is an entity, and an entity has attributes. In the table, each row represents an entity, and the columns contain the attributes of each entity. For example, in a table of customers, each row contains information for a single customer. Some of the attributes contained in the columns might include first name, last name, phone number, and age.
Assign a name to the database for your application. For instance, you might name a database containing information about households in a neighborhood HouseholdDirectory.
Look at the list of information that you want to store in the database (as discussed in the preceding section). Analyze your list and identify the objects. For instance, the HouseholdDirectory database might need to store the following:
When you analyze this list carefully, you realize that you’re storing information about two objects: the household and the household members. The address and phone number are for the household, in general, but the name, age, and favorite cereal are for each particular household member.
For instance, the HouseholdDirectory database needs a table called Household and a table called HouseholdMember
Analyze your information list and identify the attributes you need to store for each object. Break the information to be stored into its smallest reasonable pieces. For example, when storing the name of a person in a table, you can break the name into first name and last name. Doing this enables you to sort by the last name, which would be more difficult if you stored the first and last name together. You can even break down the name into first name, middle name, and last name, although not many applications need to use the middle name separately.
Give each column a name that clearly identifies the information in that column. The column names should be one word, with no spaces. For example, you might have columns named firstName and lastName or first_name and last_name.
MySQL and SQL reserve some words for their own use, and you can’t use those words as column names. The words are currently used in SQL statements or are reserved for future use. You can’t use ADD, ALL, AND, CREATE, DROP, GROUP, ORDER, RETURN, SELECT, SET, TABLE, USE, WHERE, and many, many more as column names. For a complete list of reserved words, see the online MySQL manual at http://dev.mysql. com/doc/refman/5.5/en/reserved-words.html.
Each row in a table needs a unique identifier. No two rows in a table should be exactly the same. When you design your table, you decide which column holds the unique identifier, called the primary key.
The primary key can be more than one column combined. In many cases, your object attributes don’t have a unique identifier. For example, a customer table might not have a unique identifier because two customers can have the same name. When you don’t have a unique identifier column, you need to add a column specifically to be the primary key. Frequently, a column with a sequence number is used for this purpose. For example, in Table 3-1, the primary key is the cust_id field because each customer has a unique ID number.
You can define a default that MySQL assigns to a field when no data is entered into the field. You don’t need a default, but one can often be useful. For example, if your application stores an address that includes a country, you can specify U.S. as the default. If the user doesn’t type a country, MySQL enters U.S.
You can specify that certain columns aren’t allowed to be empty (also called NULL). For instance, the column containing your primary key can’t be empty. If no value is stored in the primary key column, MySQL doesn’t create the row and returns an error message. The value can be a blank space or an empty string (for example, “”), but some value must be stored in the column. You can set other columns, in addition to the primary key, to require data.
Well-designed databases store each piece of information in only one place. Storing it in more than one place is inefficient and creates problems if you need to change information. If you change information in one place but forget to change it in another place, your database can have serious problems.
If you find that you’re storing the same data in several rows, you probably need to reorganize your tables. For example, suppose you’re storing data about books, including the publisher’s address. When you enter the data, you realize that you’re entering the same publisher’s address in many rows. A more efficient way to store this data would be to store the book information in one table and the book publisher information in another table. You can define two tables: Book and BookPublisher. In the Book table, you would have the columns title, author, pub_date, and price. In the BookPublisher table, you would have columns such as name, streetAddress, and city.
Some tables in a database are related. Most often, a row in one table is related to several rows in another table. You need a column to connect the related rows in different tables. In many cases, you include a column in one table to hold data that matches data in the primary key column of another table
A common application that needs a database with two related tables is a customer order application. For example, one table contains the customer information, such as name, address, and phone number. Each customer can have from zero to many orders. You could store the order information in the table with the customer information, but a new row would be created each time the customer placed an order, and each new row would contain all the customer’s information. You can much more efficiently store the orders in a separate table, named perhaps CustomerOrder. (You can’t name the table just Order because that’s a reserved word.) In the CustomerOrder table, you include a column that contains the primary key from a row in the Customer table so the order is related to the correct row of the Customer table. The relationship is shown in Table 3-1 (earlier in the chapter) and Table 3-2.
The Customer table in this example looks like Table 3-1. Each customer has a unique cust_id. The related CustomerOrder table is shown in Table 3-2. It has the same cust_id column that appears in the Customer table. Through this column, the order information in the CustomerOrder table is connected to the related customer’s name and phone number in the Customer table.
In this example, the columns that relate the Customer table and the CustomerOrder table have the same name. They could have different names, as long as the columns contain the same data.
MySQL stores information in different formats, based on the type of information that you tell MySQL to expect. MySQL allows different types of data to be used in different ways. The main types of data are character, numerical, and date and time data. We describe those and other data types and then tell you how to indicate which data type you’re using in each column.
The most common type of data is character data (data that’s stored as strings of characters), and it can be manipulated only in strings. Most of the information that you store is character data — for example, customer name, address, phone number, and pet description. You can move and print character data. Two character strings can be put together (concatenated), a substring can be selected from a longer string, and one string can be substituted for another.
In this format, MySQL reserves a fixed space for the data. If the data is longer than the fixed length, only the characters that fit are stored — the remaining characters on the end aren’t stored. If the string is shorter than the fixed length, the extra spaces are left empty and wasted.
In this format, MySQL stores the string in a field that’s the same length as the string. You specify a string length, but if the string itself is shorter than the specified length, MySQL uses only the space required, instead of leaving the extra space empty. If the string is longer than the space specified, the extra characters aren’t stored.
If a character string length varies only a little, use the fixed-length format. For example, a length of ten works for all ZIP codes, including those with the ZIP+4 number. If the ZIP code doesn’t include the ZIP+4 number, only five spaces are left empty. However, if your character string can vary more than a few characters, use a variable-length format to save space. For example, your pet description might be small bat, or it might run to several lines of description. By storing this description in a variable-length format, you only use the necessary space.
Another common type of data is numerical data — data that’s stored as a number. You can store decimal numbers (for example, 10.5, 2.34567, 23456.7) as well as integers (for example, 1, 2, 248). When you store data as a number, you can use that data in numerical operations, such as adding, subtracting, and squaring. If you don’t plan to use data for numerical operations, however, you should store it as a character string because the programmer will be using it as a character string. No conversion is required.
MySQL stores positive and negative numbers, but you can tell MySQL to store only positive numbers. If your data is never negative, store the data as unsigned (without a + or – sign before the number). For example, a city population or the number of pages in a document can never be negative.
MySQL provides a specific type of numeric column called an auto-increment column. This type of column is automatically filled with a sequential number if no specific number is provided. For example, when a table row is added with 5 in the auto-increment column, the next row is automatically assigned 6 in that column unless a different number is specified. You might find autoincrement columns useful when you need unique numbers, such as a product number or an order number.
A third common type of data is date and time data. Data stored as a date can be displayed in a variety of date formats. You can use that data to determine the length of time between two dates or two times — or between a specific date or time and some arbitrary date or time.
Sometimes, data can have only a limited number of values. For example, the only possible values for a column might be yes or no. MySQL provides a data type called enumeration for use with this type of data. You tell MySQL what values can be stored in the column (for example, yes and no), and MySQL doesn’t store any other values in that column.
When you create a database, you tell MySQL what kind of data to expect in a particular column by using the MySQL names for data types. Table 3-3 shows the MySQL data types used most often in web database applications.
|MySQL Data Type||Description|
|CHAR(length)||Fixed-length character string.|
|VARCHAR(length)||Variable-length character string. The longest string that can be stored is length, which must be between 1 and 255|
|TEXT||Variable-length character string with a maximum length of 64K of text.|
|INT(length)||Integer with a range from –2147483648 to +2147483647. The number that can be displayed is limited by length. For example, if length is 4, only numbers from –999 to 9999 can be displayed, even though higher numbers are stored.|
|INT(length) UNSIGNED||Integer with a range from 0 to 4294967295. length is the size of the number that can be displayed. For example, if length is 4, only numbers from 0 to 9999 can be displayed, even though higher numbers are stored.|
|BIGINT||A large integer. The signed range is –9223372036854775808 to 9223372036854775807. The unsigned range is 0 to 18446744073709551615|
|DECIMAL (length,dec)||Decimal number in which length is the number of characters that can be used to display the number, including decimal points, signs, and exponents, and dec is the maximum number of decimal places allowed. For example, 12.34 has a length of 5 and a dec of 2.|
|DATE||Date value with year, month, and date. Displays the value as YYYY-MM-DD (for example, 2013-04- 03 for April 3, 2013)|
|TIME||Time value with hour, minute, and second. Displays as HH:MM:SS.|
|DATETIME||Date and time are stored together. Displays as YYYY-MM-DD HH:MM:SS.|
|ENUM (“val1”,“val2”...)||Only the values listed can be stored. A maximum of 65,535 values can be listed.|
|SERIAL||A shortcut name for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT.|
MySQL allows many data types other than those listed in Table 3-3, but you probably need those other data types less frequently. For a description of all the available data types, see the MySQL online manual at http://dev. mysql.com/doc/refman/5.6/en/data-types.html
In this section, we design a sample database to contain customer order information. We use this database later in this chapter and in Chapter 4 of this minibook to show how to build and use a database.
The database for the order information is named CustomerOrderInformation.
The first five information items pertain to customers, so one object is Customer. The order date information pertains to the total order, so another object is CustomerOrder. The remaining four pieces of information pertain to each individual item in the order, so the remaining object is OrderItem.
The primary key for the Customer table is customerID. Therefore, customerID must be unique. The primary key for the CustomerOrder table is orderID. The primary key for the OrderItem table is orderID and catalogID together.
No defaults are defined for any table.
These columns are the primary-key columns. Never allow a row without these values in the tables.
You probably spent substantial time making the design decisions for your database. At this point, the decisions are firmly fixed in your mind. You probably don’t think that you can forget them. But suppose that a crisis intervenes; you don’t get back to this project for two months. You have to analyze your data and make all the design decisions again if you didn’t write down the decisions you originally made.
Document the organization of the tables, the column names, and all other design decisions. Your document should describe each table in table format, with a row for each column and a column for each design decision. For example, your columns would be column name, data type, and description. The three tables in the sample design for the database named CustomerOrder Information are documented in Table 3-4, Table 3-5, and Table 3-6.
|Column Name||Data Type||Description|
|customerID||SERIAL||Unique ID for customer (primary key)|
|lastName||VARCHAR(50)||Customer’s last name|
|firstName||VARCHAR(40)||Customer’s first name|
|street||VARCHAR(50)||Customer’s street address|
|zip||CHAR(10)||Customer’s ZIP code|
|VARCHAR(50)||Customer’s e-mail address|
|fax||CHAR(15)||Customer’s fax number|
|phone||CHAR(15)||Customer’s phone number|
|orderID||SERIAL||Login name specified by user (primary key|
|customerID||BIGINT||Customer ID of the customer who placed the order|
|order Date||DATETIME||Date and time that order was placed|
|catalogID||VARCHAR(15)||Catalog number of the item (primary key 1)|
|orderID||BIGINT||Order ID of the order that includes this item (primary key 2)|
|color||VARCHAR(10)||Color of the item|
|size||VARCHAR(10)||Size of the item|
|price||DECIMAL(9,2)||Price of the item|
After you’ve carefully planned your database as described earlier in the chapter, you can then get to work building the database. A database has two parts: a structure to hold the data and the data itself. In the following sections, we explain how to create the database structure. First, you create an empty database with no structure at all, and then you add tables to it.
When you create a database, you create a new subdirectory in your data directory with the database name that you assign. Files are then added to this subdirectory later, when you add tables to the database. The data directory is usually a subdirectory in the directory where MySQL is installed. You can set up a different directory as the data directory by adding a statement in the MySQL configuration file, my.cnf, in the following format:
You can add this statement to the configuration file or change the statement that’s already there.
You can create the database by using SQL statements, as described in Chapter 1 of this minibook. To create a database, you must use a MySQL account that has permission to create, alter, and drop databases and tables, and we tell you how to do that here. See Chapter 2 in this minibook for more on MySQL accounts.
Your first step in creating a new database is to create an empty database, giving it a name. Your database name can be up to 64 characters long. You can use most letter, numbers, and punctuation, with a few exceptions. In general, you can’t use characters that are illegal in directory names for your operating system (see your operating system documentation to find out what those characters are). Don’t use a space at the end of the name. Don’t use a forward slash (/) or a backward slash (\) in the database name (or in table names, either). You can use quotes in the database name, but it isn’t wise to do so.
In this statement, replace databasename with the name that you give your database. For instance, to create the sample database designed in this chapter, use the following SQL statement:
Some web hosting companies don’t allow you to create a new database. The host gives you a specified number of databases to use with MySQL, and you can create tables in only the specified database(s). You can try requesting an additional database, but you need a good reason. MySQL and PHP don’t care that all your tables are in one database, rather than organized into databases with meaningful names. Humans can just keep track of projects more easily when those projects are organized.
If a database with the name you specify already exists, an error message is returned. You can avoid this error message by using an IF phrase in your statement, as follows:
With this statement, the database is created if it doesn’t exist, but the statement doesn’t fail if the database already exists. It just doesn’t create the new database.
To see for yourself that a database was in fact created, use the SHOW DATABASES SQL query.
After you create an empty database, you can add tables to it. (Check out the section “Adding tables and specifying a primary key,” later in this chapter.)
You can delete any database, as long as you’re using a MySQL account with the DROP privilege. When you drop a database, all the tables and data in the database are dropped, as well.
Use DROP carefully because it’s irreversible. After you drop a database, that database is gone forever. And any data that was in it is gone, as well.
If the database doesn’t exist, an error message is returned. You can prevent an error message with the following statement:
This statement drops the database if that database exists. If it doesn’t exist, no error occurs. The statement just ends quietly.
You can add tables to any database, whether it’s a new, empty database that you just created or an existing database that already has tables and data in it. The rules for allowable table names are explained in the “Organizing the data” section, earlier in this chapter. When you create a table in a database, a file named tablename.frm is added to the database directory.
When you create a table, you include the table definition. You define each column — giving it a name, assigning it a data type, and specifying any other definitions required. Here are some definitions often specified for columns:
You also specify the unique identifier for each row — the primary key. A table must have a field or a combination of fields that’s different for each row. No two rows can have the same primary key. If you attempt to add a row with the same primary key as a row already in the table, you get an error message, and the row isn’t added.
Occasionally, you might want to create a table that has the same structure as an existing table. You can create a table that’s an empty copy.
You can use the CREATE statement to add tables to a database. The statement begins with the CREATE TABLE statement, as follows:
Then, you add a list of column names with definitions. Separate the information for each column from the information for the following column by a comma. Enclose the entire list in parentheses. Follow each column name by its data type and any other definitions required.
The last item in a CREATE TABLE statement indicates which column or combination of columns is the primary key. You specify the primary key by using the following format:
Enclose the columnname in parentheses. If you’re using a combination of columns as the primary key, include all the column names in the parentheses, separated by commas. For instance, you could designate the primary key as PRIMARY KEY (columnname1,columnname2)
CREATE TABLE tablename ( columnname datatype definition1 definition2 ...,
columnname datatype definition1 definition2 ..., ...,
PRIMARY KEY(columnname) )
Listing 3-1 shows the CREATE TABLE statement used to create the Customer table of the CustomerOrderInformation database. You could enter this statement on a single line if you wanted to. MySQL doesn’t care how many lines you use. The format shown in Listing 3-1 simply makes the statement easier for you to read. This human-friendly format also helps you spot typos.
CREATE TABLE Customer (
PRIMARY KEY(customerID) );
Note that the list of column names in Listing 3-1 is enclosed in parentheses (one on the first line and one on the last line), and a comma follows each column definition.
Remember not to use any MySQL reserved words for column names, as we discuss in the “Organizing the data” section, earlier in this chapter. If you use a reserved word for a column name, MySQL gives you an error message that looks like this:You have an error in your SQL syntax near ‘order var(20))’ at line 1
This error message shows the column definition that it didn’t like and the line where it found the offending definition. However, the message doesn’t tell you much about what the problem actually is. The error in your SQL syntax that it refers to is the use of the MySQL reserved word order as a column name.
If you attempt to create a table that already exists, you receive an error message. You can prevent this error message appearing by using the following CREATE statement:
If the table doesn’t exist, the statement creates it. If the table already exists, the statement doesn’t create it but also doesn’t return an error message.
You can create a new table that’s an exact copy, with the same structure, of an existing table, as follows:
The new table, tablename, is created with the same fields and definitions as oldtablename. Even if the old table contains data, the new table doesn’t include that data, just the structure.
You can remove a table, whether it’s empty or contains data. Be sure you want to remove a table before you do it.
Removing a table is irreversible. After you drop a table, that table is gone forever. And any data that was in it is gone, as well.
To remove any table, use this statement:
DROP TABLE tablename
Your database isn’t written in stone. You can change the name of any table; add, drop, or rename a column in any table; or change the data type or other attributes of any column.
Changing a database is not a rare occurrence. You might want to change your database for many reasons. For example, suppose that you defined the column lastName with VARCHAR(20) in a database that contains the names of all the employees in your company. At the time, 20 characters seemed sufficient for a last name. But you just received a memo announcing the new CEO, John Schwartzheimer-Losertman. Oops. MySQL will truncate his name to the first 20 letters, Schwartzheimer-Loser — a less-than-desirable new name for the boss. So you need to make the column wider — pronto.
You can change the database structure with an ALTER statement. The basic format for this statement is ALTER TABLE tablename, followed by the specified changes
|ADD columnname definition||Adds a column; definition includes the data type and optional definitions.|
|ALTER columnname SET DEFAULT value||Changes the default value for a column.|
|ALTER columnname DROP DEFAULT||Removes the default value for a column.|
|CHANGE columnname newcolumnname definition||Changes the definition of a column and renames the column; definition includes the data type and optional definitions.|
|DROP columnname||Deletes a column, including all the data in that column. The data can’t be recovered.|
|MODIFY columnname definition||Changes the definition of a column; definition includes the data type and optional definitions.|
|RENAME newtablename||Renames a table.|
For example, the following statement renames the Customer table to NewCustomer:
ALTER TABLE Customer RENAME NewCustomer
For another example, the following statement changes the specified column (lastName) to the specified data type (VARCHAR) and width (50):
ALTER TABLE Customer MODIFY lastName VARCHAR(50)