Team 2550
Technical Documentation
 All Files Variables Pages
Build Process

A computer does not understand C++ or any other programming language alone. It has to be turned into machine (binary) code first. Coding in binary was extremely difficult and error-prone, and it was soon realized that there needed to be something that humans could write easily. Also, machine code that could run on one CPU could not run on a newer version of that same processor. This is where Assembly comes in. Assembly is written in text and is a 1:1 equivalent to machine code. Assembly code is fed into a program called an assembler that converts the keywords into their binary counterparts. However, you still need to manually manipulate each individual byte, and Assembly is still difficult to read. You can't tell what an Assembly program is going to do just be looking at it, and once you start writing large programs, Assembly easily becomes confusing. Both Assembly and machine code are considered to be low-level languages because you are controlling everything in the program directly.

To bridge the readability gap, numerous high-level languages were created. Early ones included Fortran (for scientific applications), COBOL (for business applications), and LISP (still used for artifical intelligence). Now, there are hundreds. High-level languages are much slower when you run them, but are also far more coherent, easier to write, and much faster to write. High-level languages used today include Java, Python, Perl, C# (C-sharp), F#, JavaScript, PHP, VisualBasic.NET, and Ruby.

C and C++ are considered high-level, although the allow for a much finer degree of control than most of the languages noted above. High-level programs are typically sent into a piece of software called a compiler. A compiler converts the code that you write into machine code.

Note
A compiler is much more complex than an assembler. Remember that all Assembly code has a counterpart in machine code. High-level languages do not have exact equivalents in machine code, and, in some cases, no equivalent at all.
Warning
Just because C++ can be efficient doesn't mean that all C++ code will be. A significant amount of thought goes into writing good C++ code.

The code you write has to go through multiple steps before it can be compiled and run...

  1. Build command run from editor: the editor is where you write the code, many editors that programmers use can send the build command to the compiler automatically.
  2. Preprocessor: before the code can be compiled, the preprocessor checks to see if there are any instructions before the code is compiled. In most cases, these instructions copy the contents of another file into your code (the #include statement do this in C and C++)
    Note
    The files that you include are often required to make your code run. They provide information for libraries (a library adds extra functionality and commands to your code). Later, you will learn to create your own multi-file projects.
  3. Parser: finds errors in the code that prevent the compiler from running. For example, if you mistyped a word, forgot to close parenthesis, or forgot to add a semicolon. This process does not ensure that your code is entirely error-free. Most code that compiles correctly does not do what you want the first time.
    Note
    The parser has two kinds of output: errors and warnings. Errors will keep your code from being compiled, you have to fix the problem. Warnings usually show up in the code when you add something extra, such as a variable you don't use.
  4. Compiler: your code is turned into machine code and put into object files
    Note
    Object files contain the machine code for your program, but it cannot yet be run because it may still require information and binary code from other sources.
  5. Linker: adds the machine code for the libraries you used to your code
    Note
    Libraries are often precompiled. They are programs that your program can reference but are not used by the end user directly. If you have ever installed a program on Windows, you have probably seen .dll files before. DLL stands for Dynamic Link Library and is simply a file that contains the machine code the program to access.
  6. Loader: Puts all of the object files together into a single executable file (this is a .exe files in Windows, and has no extension on most other systems)
  7. Execution: running the program
  8. Debugger: runs the program as a subprocess so that you can see exactly what the code is doing.
    Note
    By default, most editors compile the code in debug mode, which means that the file will contain extra information for the debugger to run. This includes the actual lines of source code, so that you can walk through your code line by line.
    In addition to the parser, many editors also have built-in error checking. This can be useful as it is similar to the spell checker in Microsoft Word. It will find issues before you try to compile and as you type. If any issues are found, the editor will show you the issue. The error checker is not associated with the compiler, and it does not catch everything, but it will catch many syntax errors (such as missing semicolons).
Note
Many of the languages I listed earlier are actually interpreted. An interpreter takes the code and compiles it during execution... one line at a time. Needless to say, this is much slower and less efficient than compiling, but it is great for prototyping, debugging (to find logic issues in your code), and small projects where speed does not matter. Python, Perl, and F# are all interpreted. Java and C# use a similar tool, called a virtual machine. Instead of compiling your code completely at runtime, Java and C# are converted into bytecode, a much smaller (but not human-readable) version of your code that is then interpreted. This is faster than a normal interpreter because your code is reogranized and simplified, but it is also much slower than compiled code. The benefit of a virtual machine is that it is truly portable and can be run on any system with the appropriate virtual machine. Compiled code has to be re-compiled every time you change the system on which it is running.