This tutorial will cover constants and identifiers in C. Constants, as the name implies, are values that never change. In the previous tutorial on data types you have seen how a variable can be declared constant by making use of the const keyword. You can also declare a constant by directly entering its value in the source code. For example, in this code:
double pi = 3.14159;
char c = 'A';
char hello[] = "Hello World!";
The number 3.14159, the letter A and the string Hello World! are all constants. You have already seen these types of constants used in the examples of previous tutorials. In this tutorial we will go into more detail and cover how to declare different types of constants and the rules that apply to those constants.
In addition we will look at the rules that are applicable for identifier names. An identifier is a name of a variable, function, structure, union, typedef, etc. – basically anything that can be given a name. Now, let us start with characters.
Character Constants
A character is a single letter from an alphabet. C defines two basic alphabets; one in which the source code is written and the one which is used when the program is run. They are usually the same, but they do not have to be. If they are different, it is the compiler’s job to translate character constants from the source code alphabet to the runtime alphabet.
The basic C alphabet contains the following characters:
- upper and lower case A-Z,
- decimal digits 0-9,
- the space,
- horizontal tab,
- vertical tab,
- newline,
- backspace,
- carriage return,
- alert and form feed characters
and the following symbol characters:
! " # % & ‘ ( ) * + , – . / : ; < = > ? [ \ ] ^ _ { | } ~
Each character can be represented by a number. The numbers representing the digits 0-9 are 10 continuous integers. If for example the digit ‘0’ was represented by the number 100, the ‘1’ would be 101, ‘2’ would be 102 and so on.
Note that the C standard itself does not give the actual mappings from the characters to the numbers representing them. In practice most implementations will use the ASCII standard to map characters to numbers and back; some systems from the IBM mainframe and minicomputer world use the EBCDIC standard.
The characters discussed so far are the basic characters of the alphabets. The basic characters are guaranteed to fit inside a char type. In addition to the basic characters there are extended characters that are locale specific (they are language, culture and nationality specific).
Each implementation will define what extended characters are supported. These extended characters may not fit inside a single char. There is a "wide character" type called wchar_t defined in stddef.h that is used to hold these characters which may potentially be more than one byte.
Using Character Constants
There are numerous ways to indicate a character constant in your source code. The simplest way is the way you have seen before in previous examples – to put single quotes around the character, like this:
const char upper_h = 'H';
if ( c == 'Y' || c == 'y' )
return;
A character constant inside single quotes actually has the type int, but if assigned or cast to a char the value will not overflow the char.
Another way to enter a character constant is to specify the number that represents the character you want. The way to do this involves using an "escape sequence" to specify the character. Escape sequences allow you to designate a character in character constants and string constants using a backslash (the ‘\’ character) followed by some letters or numbers, all inside single quotes (or double quotes for string constants).
C allows you to specify a character’s number using octal or hexadecimal numbers. For octal, you use a backslash followed by one, two or three digits between 0 and 7 inside single quotes. For hexadecimal use a backslash followed by a lowercase ‘x’ and then one or more digits between 0 and 9 and letters between ‘a’ and ‘f’ inside single quotes.
For example, in the ASCII character set a capital M is assigned the number 77 which is 115 in octal or 4D in hexadecimal. You can enter a capital M using this syntax:
char a1 = '\115'; // octal
char a2 = '\x4d'; // hexadecimal
char a3 = '\x4D'; // hexadecimal
The last two show that when specifying hexadecimal digits, both upper and lower case are allowed.
If you want to use one of the extended characters you need to prefix the single quote with the letter ‘L’ to indicate it is a large character. An extended character may not fit inside a regular char but you can hold it in a wchar_t. Of course a wchar_t can hold all the basic characters too.
You can enter extended characters in the same way as the basic characters. Let us see an example with the character Ă which is represented by the two-byte hexadecimal number 0x0102 (which is 258 in decimal and 402 in octal):
The output of this program is:
a1 = 'Ă'
a2 = 'Ă'
a3 = 'Ă'
As you can see all the characters are the same, just specified in different ways. The setlocale() in line 12 is needed to initialize C’s locale information. Without it, a C program can only handle the basic characters. Also note that the printf() format specifier for a wide character is %lc instead of %c.
Special Characters
You have already seen the escape sequences for specifying a character through its octal or hexadecimal value. Here are some other escape sequences you can use.
If you want a character to represent the single quote you must use the escape sequence backslash followed by the single quote:
char single_quote = '\'';
To represent the backslash character itself use two backslashes:
char backslash = '\\';
The question mark character can be specified as an escape sequence, but it does not need to be; it can be printed as just a plain character:
char question1 = '?';
char question2 = '\?';
These lines will set both question1 and question2 to the question mark character.
There is a special escape sequence backslash followed by lowercase a that is used for alerts. This is used to get the user’s attention. How exactly this is done depends on the environment where the program is run. It could produce a sound or some visual notification.
char alert = 'a';
printf( "%cThere is a problem with the input file.\n", alert );
On this system the above code produces a flash of the output window and the line is printed in the window. On a different system the alert character may be handled differently.
There are other escape sequences that have to do with moving the cursor on the screen:
Escape Sequence Function
Escape sequence |
Function |
\n |
New line. Moves the cursor one line down and to the beginning of the line. |
\t |
Horizontal tab. Moves the cursor to the next tab position of the output device. |
\b |
Backspace. Moves the cursor one position back on the same line. |
\f |
Form feed. Moves the cursor to the next "page" of the output device. |
\r |
Carriage return. Moves the cursor to the beginning of the current line. |
\v |
Vertical tab. Moves the cursor to the next vertical tab position of the output device. |
As you can see, these escape sequences have functions that seem to apply more to old typewriters than modern displays. That shows the age of the C language. When it was created, many display systems had more in common with typewriters than today’s GUI interfaces. From the list above, you will probably use the new line and horizontal tab sequences the most. The others are rarely used, and exactly how they work depends on the display device.
Another way to specify a character is to use its universal character code defined in the ISO/IEC 10646 standard. This standard tries to give a unique number to every character of every language in the world. In the example above, the letter Ă is represented by the hexadecimal number 0x0102 (which is 258 in decimal and 402 in octal). That number is from the ISO standard.
You can specify the universal character code by using the backslash lowercase u followed by 4 hexadecimal digits, or backslash uppercase U followed by 8 hexadecimal digits. This is similar to the hexadecimal escape sequence except that the hexadecimal sequence can be arbitrarily long; the \u or \U escape sequences are strictly 4 or 8 digits, respectively. We can specify the Ă character these ways:
wchar a4 = L'\u0102';
wchar a5 = L'\U00000102';
These lines define a4 and a5 as the Ă character just as in the example above.
Trigraphs and Digraphs
This section will talk briefly about trigraphs and digraphs. You will probably never see these in real C code anymore, but you should be aware of what they are.
Trigraphs are sequences of 3 characters that stand for another character. Digraphs are sequences of two characters that stand for another character. They were employed when the system used to enter the program source code did not have a way to enter certain special characters like [, | or }. When the system lacked a | character, you could use ??! instead, and the compiler would treat that sequence of characters as a single | character. Here is a table showing the trigraphs and digraphs that C allows:
Sequence |
Meaning |
??= |
# |
??( |
[ |
??/ |
\ |
??) |
] |
??’ |
^ |
??< |
{ |
??! |
| |
??> |
} |
??- |
~ |
<: |
[ |
:> |
] |
<% |
{ |
%> |
} |
%: |
# |
%:%: |
## |
Here is a simple program using trigraphs and digraphs:
%:include <stdio.h>
%:include <string.h>
void main()
<%
char hello??(??) = "Hello World!";
int i;
for ( i = 0; i < strlen(hello); ++i )
??<
printf("??! %c ", hello??( i ??));
??>
printf("??\n");
%>
Output of program:
| H | e | l | l | o | | W | o | r | l | d | ! |
When this is compiled the compiler will recognize the trigraph and digraph sequences and will treat them as if the actual characters they replace were in the source. Note that the substitutions happen in the whole source file, even inside the string literals on lines 11 and 13.
{mospagebreak title=C String Constants}
String Constants
A string is one or more characters. String constants (also called literals) use a double quote instead of single quote to delimit the start and end of the string. Each character in the string constant can be any character allowed as a character constant. You can use all the escape sequences for character constants with strings too. Let us see some examples:
char * menu_name = "Main Menu";
int edit_count = 0;
if (strcmp(menu_name, "Edit Menu") == 0)
{
printf("Edit menu counter = %d\n", ++edit_count);
edit();
}
In this code "Main Menu", "Edit Menu" and "Edit menu counter = %d\n" are all string literals.
If you want to include a double quote character inside a string use the escape sequence backslash double quote like this:
const char * s = "This string has \" double quotes\" ";
If you have a very long string literal there are a couple of ways to break it up. Two or more string constants next to each other will be treated as one string constant, so you could break up the long string literal into smaller ones and put them next to each other.
Another way is to use a backslash at the end of a line inside a string literal; then the string can be continued on the next line. Here is an example:
#include <stdio.h>
void main()
{
printf( "Once upon a midnight "
"dreary,\nwhile I "
"pondered weak and weary,\n" );
printf( "Over many a quaint and curious\nvolume of \
forgotten lore,\n" );
}
Here is the output of program:
Once upon a midnight dreary,
while I pondered weak and weary,
Over many a quaint and curious
volume of forgotten lore,
On lines 5, 6 and 7 the string constants are added together since they are right next to each other without any other syntax between them. You can see in the output that the line is only broken for the newline character between dreary and while. As far as printf() is concerned it is passed one long string , not 3 separate strings.
On line 8 you can see that the last character of the line is a backslash. Then the string continues on line 9. Because of the backslash at end end, the string on line 9 is added to the end of the one on line 8 (without a backslash or newline). With the backslash method you do not need to use the double quote to start the string again on line 9. In the output, the line is only broken for the newline between curious and volume.
Just as there are wide characters to hold extended characters, there are wide string literals to hold strings containing extended characters. You indicate a wide string literal by prefixing it with an L just as for wide characters:
wchar_t widestring[] = L"This is a wide string literal.";
The string above does not contain any extended characters but wide strings can hold basic characters as well as extended characters.
Strings in C are terminated by the null character, which has a numeric value of 0. String literals are also automatically terminated by the null character. This program shows that the null character is added automatically:
#include <stdio.h>
#include <string.h>
#include <stddef.h>
#include <wchar.h>
#include <locale.h>
void main()
{
setlocale( LC_ALL, "" );
char hello[] = "Hello";
wchar_t whello[] = L"H\x113ll\x151";
printf( "hello = \"%s\"\n", hello);
printf( "whello = \"%ls\"\n", whello);
printf( "length of hello = %d characters\n", strlen( hello ));
printf( "length of whello = %d characters\n", wcslen( whello ));
printf( "size of hello = %d bytes\n", sizeof( hello ));
printf( "size of whello = %d bytes\n", sizeof( whello ));
}
The output of this program follows:
hello = "Hello"
whello = "Hēllő"
length of hello = 5 characters
length of whello = 5 characters
size of hello = 6 bytes
size of whello = 24 bytes
Line 11 declares character array hello and initializes it with the string literal "Hello". Line 12 does the same for wide character array whello but uses extended characters for e and o. The strlen() function is called on line 17 to get the length of the string, which is 5 characters. Functions like strlen() that work on plain character strings generally will not work on wide character strings. The wcslen() is the wide character counterpart to strlen(); it also returns 5 characters as is normal. Also note on line 15 the format specifier for a wide character string is %ls instead of %s.
However on line 20 sizeof() returns 6 bytes, not 5. This is because the compiler automatically added an extra null byte at the end of the string. The sizeof() on line 21 returns 24 bytes. That is because on this system a wide character takes 4 bytes. The 5 wide character string plus the wide null character is 6, times 4 bytes for each character gives 24 bytes.
{mospagebreak title=C Integer and Floating Point Constants}
Integer Constants
You have seen many examples of integer constants in other tutorials. Integer constants are usually entered as decimal numbers, but you can enter them in octal or hexadecimal too. For decimal, the number must not start with a 0. Octal numbers start with a 0 and a hexadecimal numbers start with either 0x or 0X. Integer constants cannot have a decimal point or an ‘e’ or ‘E’ followed by an exponent. Integer constants in decimal, octal or hexadecimal may be prefixed by a ‘+’ or ‘-‘ to indicate a positive or negative number.
Some examples:
int n = -309; // decimal -309
int m = 037; // decimal 31
int l = -0x1fc; // decimal -508
The m variable is initialized with an octal integer constant, and l is initialized with a hexadecimal constant. (The // and text following them are comments.) For values 10-15 in hexadecimal you can use either upper or lower case letters A-F.
The type of a decimal integer constant is either an int, long int or long long int. The compiler will go through that list of types in that order and use the first type which can hold the value. For octal or hexadecimal constants it will add unsigned types to the list, so that the list is int, unsigned int, long, unsigned long, etc.
You can add suffixes to integer constants to narrow down the choices the compiler will consider. The u or U suffix tells the compiler to only consider unsigned types. The l or L suffix limits the choices to long or long long integers. The ll or LL suffix further limits them to long long integers only. The unsigned suffix may be combined with the long or long long suffixes in any order. For example:
unsigned num1 = 1234u;
unsigned long long bignum = 12345678901234LLU;
long ln = 5000000000;
The 1234u constant creates an unsigned int – the u suffix limits the choices to unsigned types, and an unsigned int is big enough to hold the value 1234. The 12345678901234LLU creates an unsigned long long number. The suffix for that number could also have been written uLL ULL, llu, llU, etc. The 5000000000 on the last line creates a long long integer because there is no suffix and neither a plain integer nor long integer can hold the value (this depends on the implementation’s sizes of integer types as you saw in the data types tutorial).
Floating Point Constants
You have seen floating point constants earlier in previous tutorials as well. A floating point constant can be entered in decimal or hexadecimal, though in practice hexadecimal is rarely used. A decimal floating point constant must have either a decimal point or an ‘e’ or ‘E’ plus an exponent or both. Here are some examples:
double a = .1;
double b = 49392293.;
double c = 773.2992E-2;
double d = 3e+9;
double e = 1.7349E17;
On the second line, without the decimal point at the end that number would be an integer constant, which would be converted to a double and assigned to b.
Hexadecimal floating point constants have a format like this:
0xH.HpD
where H means 0 or more hexadecimal digits and D means 1 or more decimal digits. Described in words, hexadecimal floating point constants start with the 0x or 0X prefix followed by zero or more hexadecimal digits. The digits should be followed by a period (.), then zero or more hexadecimal digits and the letter ‘p’ or ‘P’. The ‘p’ or ‘P’ is like the ‘e’ in decimal floating point constants – it starts the exponent part of the number except that for hexadecimal floating point numbers the base is 2 instead of 10. The ‘p’ or ‘P’ should be followed optionally by a ‘+’ or ‘-‘ then by one or more decimal digits (0-9) . Let us look at some examples:
double a = 0x1.999999999999aP-4; // decimal .1
double b = 0x17.8D5528p+21; // decimal 49392293
double c = 0X1.eee957470eb25p+2; // decimal 773.2992E-2
The values for a, b, and c here are the same as the ones in the previous example, just specified using hexadecimal notation. Remember that for hexadecimal floating point constants you must specify an exponent part.
A plain floating point constant will have type double. As with integer constants you can put suffixes on floating point constants to tell the compiler what type to use. A suffix of ‘f’ or ‘F’ will treat the number as a float, and a suffix of ‘l’ or ‘L’ will treat the number as a long double.
double a = 3.14159;
float b = 3.14159f;
long double c = 3.14159L;
You cannot mix the ‘f’ and ‘l’ suffixes, you must choose one.
Some implementations might define some useful constants in the math.h header file like M_PI for the value of pi. These are provided by the implementation, they are not part of the C standard. Look at your implementation’s documentation for information about any constants it provides.
{mospagebreak title=C Identifiers}
Identifiers
An identifier is a name. It can be the name of a variable, function, a structure or union, a member of a struct, union or enum, a typedef name, a macro name or a macro variable. Some examples:
double a = 93.2;
typedef enum { Line, Triangle, Square } ShapeKind;
struct shape {
int num_verticies;
double *vx, *vy;
ShapeKind kind;
};
int draw_shape( struct shape * );
float b3x;
int \u015cocket;
In the code above, a, Line, Triangle, Square, ShapeKind, num_verticies, vx, vy, kind, draw_shape, b3x and \u015cocket are all identifiers. That last one is an example of using universal character codes in identifiers.
There are some rules you must follow when creating an identifier. An identifier is a sequence of one or more characters. It must start with a non-digit character; it can use upper and lower case a-z characters or universal character codes (discussed in the character constants section) or an underscore character (_). After the first character, digits 0-9 may also be used.
Upper and lower case alphabetic characters are distinct – an identifier Ac is different from ac. Even though it is technically possible to start an identifier with an underscore (_) character, you should avoid doing that. Identifiers starting with underscores are reserved for future use.
Different identifiers within the same scope must be unique. For each identifier, the compiler must determine if it is the same as one seen before or if it is a new identifier. This is usually done by looking at a certain number of characters at the beginning of the name. Exactly how many characters are compared is up to the implementation; the C standard only specifies the minimum number to compare. In addition, the number of characters to compare may be different for internal and external identifiers.
Recall that an external identifier is one that is (maybe implicitly) declared with extern storage class. The C standard says that at least 31 initial characters of the identifier must be compared. In addition, if you use universal character codes in these identifiers, each of those characters may count as either 6 or 10 characters (depending on the code).
For non-external identifiers the C standard says at least 63 initial characters must be compared. For these identifiers a universal character code counts as 1 character.
An identifier also cannot be one of the C keywords. These are words like if, then, while, for, etc. that are part of the C language. Here is a list of C keywords:
auto enum restrict unsigned break extern return void case float short volatile char for signed while const goto sizeof _Bool continue if static _Complex default inline struct _Imaginary do int switch double long typedef else register union
Choosing an identifier name should be done carefully. In these examples we have
been using names like a1, b, and so on – short and not very descriptive. In
actual code, identifiers should strike a balance between being easy to type and
being descriptive.
There are various conventions programmers follow when creating identifiers.
For example, if an identifier represents two (or more) words such as the draw_shape() function above, some people will prefer to call it drawshape(), while others will use drawShape() or DrawShape(), or Draw_Shape(). Also, some programmers will use all uppercase identifiers for constants. These are just conventions some people follow, the C language itself does not force any rules on identifiers except for the ones discussed above.
There is one predefined identifier that the compiler defines in functions: __func__. This is a character array containing the name of the function. It is useful to know the name of the function sometimes, especially when printing diagnostic messages:
#include <stdio.h>
double sum( double a, double b )
{
printf( "%s called with %g and %g\n", __func__, a, b );
return a + b;
}
double mult( double a, double b )
{
printf( "%s called with %g and %g\n", __func__, a, b );
return a * b;
}
void main()
{
printf("(1+2)*(3+4) = %g\n", mult( sum(1,2), sum(3,4) ));
}
This program outputs:
sum called with 3 and 4
sum called with 1 and 2
mult called with 3 and 7
(1+2)*(3+4) = 21
You can see that the __func__ variable is set to each function’s name on lines 5 and 11.