Variables, Data Types, Expressions, Assignment, Formatting¶
Estimated time to read: 51 minutes
Variables¶
Variables are containers to store information and facilitates data manipulation. They are named and typed. Detailed Reference
Container sizes are measured in Bytes. Bytes are the smallest addressable unit in a computer. Each byte is composed by 8 bits. Each bit can be 1
or 0
(true
or false
). If one byte have 8 bits and each bit one can hold 2 different values, the combination of all possible cases that a byte can be is 2^8
which is 256
, so one byte can hold up to 256
different states or possibilities.
Data Types¶
There are several types of variables in C++, including:
- Primitive data types: These are the most basic data types in C++ and include integer, floating-point, character, and boolean types.
- Derived data types: These data types are derived from the primitive data types and include arrays, pointers, and references.
- User-defined data types: These data types are defined by the programmer and include structures, classes, and enumerations.
Numeric types¶
There are some basic integer container types with different sizes. It can have some type modifiers to change the default behavior or the type.
The common size of the integer containers are 1
(char
), 2
(short int
), 4
(int
) or 8
(long long
) bytes. For a more detailed coverage read this.
Note
But the only guarantee the C++ imposes is: 1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)
and it can result in compiler defined behaviours where a char
can have 8 bytes and a long long
can be 1 byte.
Note
If you care about being cross-platform conformant, you have to always specify the sign modifier or use a more descriptive type such as listed here.
For floating pointing numbers, the container size can be 4
(float
), 8
(double
), 10
(deprecated) or 16
(long double
) bytes.
The sign modifiers can be signed
and unsigned
and are applicable only for integer types.
The default behavior of the types in a x86 cpu are as signed numbers and the first bit of the container is the signal. If the first bit is 0
, it means it is positive. If the first bit is 1
, it means it is negative. More details.
Which means that if the container follow two complement and is the size of 1 byte(8 bits), it have 1 bit for the signal and 7 bit for the content. So this number goes from -128
up to 127
, this container is typically a signed char
. The positive size has 1 less quantity in absolute than the negative because 0 is represented in positive side. There are 256
numbers between -128
and 127
inclusive.
Char¶
A standard char
type uses 1 byte to store data and follows complement of 1. Going -127
to 127
, so tipically represents 255
numbers.
A signed char
follows complement of 2 and it can represent 2^8
or 256
different numbers. By default, in x86 machine char is signed and the represented numbers can go from -2^7
or -128
up to 2^7 - 1
or 127
.
An unsigned char
Chars can be used to represent letters following the ascii table where each value means a specific letter, digit or symbol.
Note
A char
can have different sizes to represent different character coding for different languages. If you are using hebrew, chinese, or others, you probably will need more than 1 byte to represent the chars. Use char8_t
(UTF8), char16_t
(UTF16) or char36_t
(UTF32), to cover your character encoding for the language you are using.
ASCII table¶
ASCII - American Standard Code for Information Interchange - maps a number to a character. It is used to represent letters, digits and symbols. It is a standard that is used by most of the computers in the world.
It is a 7 bit table, so it can represent 2^7
or 128
different characters. The first 32 characters are control characters and the rest are printable characters. Reference. There are other tables that extend the ASCII table to 8 bits, or even 16 bits.
The printable chacacters starts at number 32 and goes up to 126. The first 32 characters are control characters and the rest are printable.
As you can imagine, this table is not enough to represent all the characters in the world(latin, chinese, japanese, etc). So there are other tables that extend the ASCII table to 8 bits, or even 16 bits.
Integer¶
Note
Most of the information that I am covering here might be not precise, but the overall idea is correct. If you want a deep dive, read this.
A standard int
type uses 4 bytes to store data. It is signed by default.
It can represent 2^32
or 4294967296
different numbers. As a signed type, it can represent numbers from -2^31
or -2147483648
up to 2^31 - 1
or 2147483647
.
The type int
can accept sign modifiers as signed
or unsigned
to change the behavior of the first bit to act as a sign or not.
The type int
can accept size modifiers as short
(2 bytes) or long long
(8 bytes) to change the size and representation capacity of the container. Type declaration short
and short int
result in the same container size of 2 bytes. In the same way a long long
or long long int
reserves the same size of 8 bytes for the container.
The type long
or long int
usually gives the same size of int
as 4 bytes. Historical fact or myth: This abnormality, comes from the evolution of the definition of int
: in the past, 2 bytes were enough for the majority of the scenarios in the 16 bits processors, but it frequently reached the limits of the container and it overflowed. So they changed the standard definition of a integer from being 2 bytes to 4 bytes, and created the short modifier. In this scenario the long int lost the reason to exist.
Here goes a list of valid integer types and its probable size(it depends on the implementation, cpu architecture and operation system): - Size of 2 bytes: short int
, short
, signed short int
, signed short
, unsigned short int
, unsigned short
, - Size of 4 bytes: signed
, unsigned
, int
, signed int
, unsigned int
, long int
, long
, signed long int
, signed long
, unsigned long int
, unsigned long
, - Size of 8 bytes: long long int
, long long
, signed long long int
, signed long long
, unsigned long long int
, unsigned long long
.
OPINION: I highly recommend the usage of these types instead, to ensure determinism and consistency between compilers, operating systems and cpu architectures.
Float pointing¶
There are 3 basic types of floating point containers: float
(4 bytes) and double
(8 bytes) and long double
(16 bytes) to represent fractional numeric types.
The standard IEEE754 specifies how a floating point number is stored in the form of bits inside the container. The container holds 3 basic information to simulate the behavior of a fractional type inside a binary type: signal, exponent and fraction.
Note
This standard was very open to implementation definition in the past, and this is one of the root causes of non-determinism physics simulation. This is the main problem you cannot guarantee the same operation with the same pair of numbers will consistently give the same result across different types of processors and compilers, thus making the physics of a multiplayer game consistency hardly achievable. Many deterministic physics engines tend to not use this standard at all, and implement those behaviors via software on top of integers instead. There are 2 approaches to solve the floating-point determinism: softfloat that implement all the IEEE754 specifications via software, or implement some kind of fixed-point arithmetic on top of integers.
Booleans¶
bool
is a special type that has the container size of 1 byte but the compiler can optimize and pack up to 8 bool
s in one byte if they are declared in sequence.
Enums¶
An enumeration is a type that consists of a set of named integral constants. It can be defined using the enum
keyword:
This defines a new type called Color
, which has three possible values: Red
, Green
, and Blue
. By default, the values of these constants are 0
, 1
, and 2
, respectively. However, you can specify your own values:
You can then use the enumeration type just like any other type:
Enumerations can also have their underlying type explicitly specified:
Here, the underlying type of the enumeration is char
, so the constants Red
, Green
, and Blue
will be stored as characters(1 byte size). The enum class
syntax is known as a "scoped" enumeration, and it is recommended over the traditional enum
syntax because it helps prevent naming conflicts. See the CppCoreGuidelines to understand better why you should prefer using this.
// You can make the value of the constants
// explicit to make your debugging easier:
enum class Color : char {
Red = 'r',
Green = 'g',
Blue = 'b'
};
Special derived type: string¶
string
is a derived type and in order to use it, string should be included in the beginning of the file or in the header. char
are the basic unit of a string
and is used to store words as a sequence of chars.
In C++, a string is a sequence of characters that is stored in an object of the std::string
class. The std::string
class is part of the C++ Standard Library and provides a variety of functions and operators for manipulating strings.
void type
¶
When void
type specifier is used in functions, it indicates that a function does not return a value.
It can also be used as a placeholder for a pointer to a memory location to indicate that the pointer is "universal" and can point to data of any type, but this can be arguably a bad pattern, and should be used exceptionally when interchanging types with c-style API.
We are going to cover this again when covering pointers and functions.
Variable Naming¶
Variable names are called identifiers. In C++, you can use any combination of letters, digits, and underscores to name a variable, it should follow some rules:
- Variables can have numbers, en any position, except the first character, so the name does not begin with a digit. Ex.
point2
andvector2d
are allowed, but9life
isn't; - Variable names are case-sensitive, so "myVar" and "myvar" are considered to be different variables;
- Can have
_
in any position of the identifier. Ex._myname
anduser_name
are allowed; - It is not a reserved keyword;
Keep in mind that it is a good practice to choose descriptive and meaningful names for your variables, as this can make your code easier to read and understand. Avoid using abbreviations or acronyms that may not be familiar to others who may read your code.
It is also important to note that C++ has some naming conventions that are commonly followed by programmers. For example, it is common to use camelCase or snake_case to separate words in a variable name, and to use all lowercase letters for variables that are local to a function and all uppercase letters for constants.
Variable declaration¶
Variable declaration in C++ follows this pattern.
TYPENAME
can be the name of any predefined type. See Variable Types for the types. VARIABLENAME
can be anything as long it follow the naming rules. See Variable Naming for the naming rules. Note
A given variable name can only be declared once in the same context / scope. If you try to redeclare the same variable, the compiler will accuse an error.
Note
You can redeclare the same variable name in different scopes. If one scope is parent of the other, the current will be used and will shadow the content of the one from outer scope. We are going to cover this more when we are covering multi-file projects and functions.
Examples:
int a; // integer variable
float pi; // floating-point variable
char c; // character variable
bool d; // boolean variable
string name; // string variable
Note
We are going to cover later in this course other complex types in other modules such as arrays, pointers and references.
Variable assignment¶
=
operator means that whatever the container have will be overwritten by the result of the right side statement. You should read it not as equal
but as receives
to avoid misunderstanding. Reference
int a = 10; // integer variable
float pi = 3.14; // floating-point variable
char c = 'A'; // character variable
bool d = true; // boolean variable
string name = "John Doe"; // string variable
Every variable, by default, is not initialized. It means that you have to set the content of it after declaring. If the variable is read before the assignment, its content is garbage, it will read whatever is set in the memory stack for the given container location. So the best approach is to always set a value when a variable is declared or be assured that every variable is never read before an assigment.
A char
variable can be assigned by integer numbers or any characters between single quotes.
char c;
c = 'A'; // the content is 65 and the representation is A. see ascii table.
c = 98; // the content is 98 and the representation is b. see ascii table.
A bool
is by default either true
or false
, but it can be assigned by numeric value following this rule: - if the value is 0
, then the value stored by the variable is false
(0
); - if the value is anything different than 0
, the value stored is true
(1
);
To convert a string
to a int
, you have to use a function stoi(for int
), stol(for long
) or stoll(for long long
) because both types are not compatibles.
To convert a string
to a float
, you have to use a function stof(for float
), stod(for double
), or stold(for long double
) because both types are not compatibles.
Literals¶
Literals are values that are expressed freely in the code. Every numeric type can be appended with suffixes to specify explicitly the type to avoid undefined behaviors or compiler defined behaviors such as implicit cast or container size.
Integer literals¶
There are 4 types of integer literals. - decimal-literal: never starts with digit 0
and followed by any decimal digit; - octal-literal: starts with 0
digit and followed by any octal digit; - hex-literal: starts with 0x
or 0X
and followed by any hexadecimal digit; - binary-literal: starts with 0b
or 0B
and followed by any binary digit;
// all of these variables holds the same value, 42, but using different bases.
// the right side of the = are literals
int deci = 42;
int octa = 052;
int hexa = 0x2a;
int bina = 0b101010;
Suffixes:
no suffix
provided: it will use the first smallest signed integer container that can hold the data starting fromint
;u
orU
: it will use the first smallest unsigned integer container that can hold the data starting fromunsigned int
;l
orL
: it will use the first smallest signed integer container that can hold the data starting fromlong
;lu
orLU
: it will use the first smallest unsigned integer container that can hold the data starting fromunsigned long
;ll
orLL
: it will use the long long signed integer containerlong long
;llu
orLLU
: it will use the long long unsigned integer containerunsigned long long
;
Float point literals¶
There are 3 suffixes in floating point decimals.
no suffix
means the container is a double;f
suffix means it is a float container;l
suffix means it is a long double container;
A floating point literal can be defined by 3 ways:
- digit-sequence decimal-exponent suffix(optional).
1e2
means its adouble
with the value of1*10^2
or100
;1e-2f
means its afloat
with the value of1*10^-2
or0.01
;
- digit-sequence . decimal-exponent(optional) suffix(optional).
2.
means it is adouble
with value of2
;2.f
means it is afloat
with value of2
;2.1l
means it is along double
with value of2.1
;
- digit-sequence(optional) . digit-sequence decimal-exponent(optional) suffix(optional)
3.1415f
means it is afloat
with value of3.1415
;.1
means it is adouble
with value of0.1
;0.1e1L
means it is along double
with value of1
;
Arithmetic Operations¶
In C++, you can perform common arithmetic operations is statements using the following operators Reference:
- Addition:
+
- Subtraction:
-
- Multiplication:
*
- Division:
/
- Modulus (remainder):
%
There are two special cases called unary increment / decrement operators that may occur in before(prefixed) or after(postfixed) the variable name reference. If prefixed it is executed first and then return the result, if postfixed, it returns the current value and then execute the operation:
- Increment:
++
; - Decrement:
--
;
There are shorthand assignment operators reference that reassign the value of the variable after executing the arithmetic operation with the right side of the operator with the old value of the variable:
- Addition:
+=
- Subtraction:
-=
- Multiplication:
*=
- Division:
/=
- Modulus (remainder):
%=
Here is an example of how to use these operators in a C++ program:
#include <iostream>
int main() {
int a = 5;
int b = 2;
std::cout << a + b << std::endl; // Outputs 7
std::cout << a - b << std::endl; // Outputs 3
std::cout << a * b << std::endl; // Outputs 10
std::cout << a / b << std::endl; // Outputs 2
std::cout << a % b << std::endl; // Outputs 1
a++;
std::cout << a << std::endl; // Outputs 6
a--;
std::cout << a << std::endl; // Outputs 5
std::cout << a++ << std::endl; // Outputs 5 because it first returns the current value and then increments.
std::cout << a << std::endl; // Outputs 6
std::cout << --a << std::endl; // Outputs 5 because it first decrements the value and then return it already changed;
std::cout << a << std::endl; // Outputs 5
b *= 2; // it is a short version of b = b * 2;
std::cout << b << std::endl; // Outputs 4
b /= 2; // it is a short version of b = b / 2;
std::cout << b << std::endl; // Outputs 2
return 0;
}
Note that the division operator (/
) performs integer division if both operands are integers. If either operand is a floating-point type, the division will be performed as floating-point division. So 5/2
is 2
because both are integers, se we use integer division, but 5/2.
is 2.5
because the second one is a double
literal.
Also, the modulus operator (%
) returns the remainder of an integer division. For example, 7 % 3
is equal to 1
, because 3
goes into 7
two times with a remainder of 1
.
Implicit cast¶
Implicit casting, also known as type coercion, is the process of converting a value of one data type to another data type without the need for an explicit cast operator. In C++, this can occur when an expression involves operands of different data types and the compiler automatically converts one of the operands to the data type of the other in order to perform the operation.
For example:
int a = 1;
double b = 1.5;
int c = a + b; // c is automatically converted to a double before the addition
b
is a double, while the value of a
is an int
. When the addition operator is used, the compiler will automatically convert a to a double
before performing the addition. The result of the expression is a double
, so c
is also automatically converted to a double
before being assigned the result of the expression. Implicit casting can also occur when assigning a value to a variable of a different data type. For example:
In this case, the value of a
is an int, but it is being assigned to a double variable. The compiler will automatically convert the value of a
to a double
before making the assignment.
It's important to be aware of implicit casting, because it can sometimes lead to unexpected results or loss of precision if not handled properly. In some cases, it may be necessary to use an explicit cast operator to explicitly convert a value to a specific data type.
Explicit cast¶
In C++, you can use an explicit cast operator to explicitly convert a value of one data type to another. The general syntax for an explicit cast are:
// ref: https://en.wikibooks.org/wiki/C%2B%2B_Programming/Programming_Languages/C%2B%2B/Code/Statements/Variables/Type_Casting
(TYPENAME) value; // regular c-style. do not use this extensively
static_cast<TYPENAME>(value); // c++ style conversion, arguably it is the preferred style. use this if you know what you are doing.
TYPENAME(value); // functional initialization, slower but safer. might not work for every case. Use this if you are unsure or want to be safe.
TYPENAME{value}; // initialization style, faster, convenient, concise and arguably safer because it triggers warnings. use this for the general case.
For example:
In this example, the value of a
is an int
, but it is being explicitly converted to a double
using the explicit cast operator. The result of the cast is then assigned to the double
variable b
.
Explicit casts can be useful in situations where you want to ensure that a value is converted to a specific data type, regardless of the data types of the operands in an expression. However, it's important to be aware that explicit casts can also lead to unexpected results or loss of precision if not used carefully. This behaviour is called narrowing.
C-style:
In this case, the value of a
is an int
, but it is being explicitly converted to a char
using the explicit cast operator. However, the range of values that can be represented by a char
is much smaller than the range of values that can be represented by an int
, so the value of a
is outside the range that can be represented by a char
. As a result, b
is assigned the ASCII value for the character 1
, which is not the same as the original value of a
. The value !
is 33
in ASCII table, and 33
is the result of the 20001 % 256
where 256
is the number of elements the char
can represent. In this case, what happened was a bug that is hard to track called int overflow
.
auto
keyword¶
auto
keyword is mostly a syntax sugar to automatically infer the data type. It is used to avoid writing the full declaration of complex types when it is easily inferred. auto
is not a dynamic type, once it is inferred, it cannot be changed later like in other dynamic typed languages such as javascript.
auto i = 0; // automatically inferred as an integer type;
auto f = 0.0f; // automatically inferred as a float type;
i = "word"; // this won't work, because it was already inferred as an integer and integer container cannot hold string
Formatting¶
There are many functions to help you format the output in the way it is expected, here goes a selection of the most useful ones I can think. Yon can find more functions and manipulators here and here.
To set a fixed precision for floating point numbers in C++, you can use the std::setprecision
manipulator from the iomanip
header, along with the std::fixed
manipulator.
Here's an example of how to use these manipulators to output a floating point number with a fixed precision of 3 decimal places:
#include <iostream>
#include <iomanip>
int main() {
double num = 3.14159265;
std::cout << std::fixed << std::setprecision(3) << num << std::endl;
// Output: 3.142
return 0;
}
You can also use the std::setw
manipulator to set the minimum field width for the output, which can be useful for aligning the decimal points in a table of numbers.
For example:
#include <iostream>
#include <iomanip>
int main() {
double num1 = 3.14159265;
double num2 = 123.456789;
std::cout << std::fixed << std::setprecision(3) << std::setw(8) << num1 << std::endl;
std::cout << std::fixed << std::setprecision(3) << std::setw(8) << num2 << std::endl;
// Output:
// 3.142
// 123.457
return 0;
}
Note that these manipulators only affect the output stream, and do not modify the values of the floating point variables themselves. If you want to store the numbers with a fixed precision, you will need to use a different method such as rounding or truncating the numbers.
To align text to the right or left in C++, you can use the setw
manipulator in the iomanip
header and the right
or left
flag. More details here
Here is an example:
#include <iostream>
#include <iomanip>
int main() {
std::cout << std::right << std::setw(10) << "Apple" << std::endl;
std::cout << std::left << std::setw(10) << "Banana" << std::endl;
return 0;
}
Both will print inside a virtual column with the size of 10 chars. This will output the following:
Optional Exercises¶
Do all exercises up to this topic here.
In order to get into coding, the easiest way to learn is by solving coding challenges. It is like learning any new language, you have to be exposed and involved. Do not do only the homeworks, otherwise you are going to fail. Another metaphor is: the homework is the like a competition that you have to run to prove that you are trained, but in order to train, you have to do small runs and do small steps first, so you have to train yourself ot least 2x per week.
The best way to train yourself in coding and solving problems in my opinion is this:
- Sort Beecrowd questions from the most solved to the least solved questions here is the link of the list already filtered.
- Start solving the questions from the top to the bottom. Chose one from de the beginning, it would be one of the easiest;
- If you are feeling comfortable and being able to solve more than 3 per hour, you are allowed to skip some of the questions. It is just like in a gym, when you get used with the load, you increase it. Otherwise continue training slowly.
Homework¶
banknotes and coins - Here you will use formatting, modulus, casting, arithmetic operations, compound assignment. You don't need to use if-else.
Hint. Follow this only if dont find your way of solving it. You can read the number as a double, multiply by 100 and then do a sequence of modulus and division operations.
double input; // declare the container to store the input
cin >> input; // read the input
long long cents = static_cast<long long>(input * 100); // number of cents. Note: if you just use float, you will face issues.
long long notes100 = cents/10000; // get the number of notes of 100 dollar (100 units of 100 cents)
cents %= 10000; // remove the amount of 100 dollars
Another good way of solving it avoiding casting is reading the number as string and removing the point. Never use float
for money
string input; // declare the container to store the input
cin >> input; // read the input
// given every input will have the dot, we should remove it. remove the dot `.`
input = input.erase(str.find('.'), 1);
// not it is safe to use int, because no bit is lost in floating casting and nobody have more than MAX_INT cents.
int cents = stoll(input); // number of cents.
long long notes100 = cents/10000; // get the number of notes of 100 dollar (100 units of 100 cents)
cents %= 10000; // update the remaining cents by removing the amount of 100 dollars in cents units
Troubleshooting¶
If you have problems here, start a discussion. Nhis is publicly visible and not FERPA compliant. Use discussions in Canvas if you are enrolled in a class with me. Or visit the tutoring service.