Skip to content
asoplata edited this page Jul 24, 2013 · 7 revisions

The [C](https://en.wikipedia.org/wiki/C_(programming_language) and C++ languages are two of the most popular general-purpose programming languages in the world and some of the oldest still used heavily. Much of systems-level programming, embedded life-critical programming, and programming where efficiency and speed are desired (as in scientific programming for simulation) are written in C and C++. For example, the Linux kernel is written entirely in C, and the Windows operating system is written in C, C++, and C#. However, both can be used for much simpler sub-enterprise-level programs for the desktop, the command line, network connections, you name it. They are "compiled" languages (as opposed to "interpreted" like Python), so each time you make a change to the core code, you must compile the code into a binary executable file, the running of which is then "running the program". C++ is technically a superset of C, but use and best practices of each of them have diverged so much that they are considered practically different languages; for example, some things that are best practices in C are worst practices in C++ due to the difference in their standard libraries.

Advantages of C/C++ for scientific programming

  • The languages and all the major compilers used today are free.
  • If you have a machine today that can be called "a computer" in the colloquial way, then you can develop C or C++ on it. This includes desktops, laptops, servers, netbooks, etc. no matter the operating system (Windows, Mac, Linux).
  • There are lots of both old and reliable, or new and fancy, tools and Integrated Development Environments (IDE) for developing C/C++ programs. Today, almost everyone uses free tools - you will never have to pay to code and compile in C/C++, or even get an institutional license like you would for MATLAB.
  • They are some of the oldest languages still in heavy use. You may think "If it's old, isn't it deprecated?" but contrary to that, its legacy is a benefit! :
    • They both have collossal library support; chances are, if you're trying to do something simple like implement multidimensional arrays, that problem has not only been solved (and the solution is sitting on Stack Overflow) but maybe even optimized. This includes libraries for scientific programming, like Fast Fourier Transforms, signal analysis, etc.
    • They're one of the few cases where they're still around due to the usefulness of their core DESIGN, as opposed to getting locked-in by a language before everyone realized how bad the language was. New projects are started in C/C++ almost every day, still.
    • Many people "speak" these languages. While they're not as human readable as Python, so many people know and write C/C++ that others can often understand your code.
    • Their best practices are well-defined after many years of use by many people, and the books that teach them the best have, in general, stood the test of time. Additionally, there are TONS of free online resources for learning the languages going back decades.
  • They are still used as benchmarks for speed tests with new languages, and offer some of the best computer efficiency and speed per development time around (FORTRAN is still used for extreme/supercomputer-requiring efficiency and speed requirements, but lacks much of both the ease of use in C and abstraction in C++).
    • Their effiency per development time comes from the mixture of low-level memory management (powerful pointers to arrays of data objects that are contiguous in computer memory), compile-time static typing of data (except for C++ templates), and high-level abstraction such as custom data types in both and very powerful Object-oriented Programming (OOP) in C++.
  • In fact, in the required learning about how to use memory with C/C++, you will have to learn more about how computers handle memory! This low-level understanding of memory will then be applicable to what is "really going on" in other languages, and enable you to write more efficient and faster programs across the board in other languages.
  • They work well with other languages: Python, for example, is written from C, and can directly call C-compiled functions and code.
  • They are very widely used outside of scientific programming, and so non-scientist programmers can help you solve coding problems...or even offer you a job coding C/C++ if you decide science isn't for you!

Disadvantages of C/C++ for sci programming

  • Neither are very easily human readable, as opposed to Python. It can be very difficult sometimes to figure out what someone else's (or even your own old code!) is doing if it is insufficiently clear or explained. Thus, formatting and good style are even more important than usual.
  • Pointers are notorious for occasionally baffling even the most seasoned programmer, and lack of care when using pointers can actually damage your computer!
  • There is no automated garbage collection, as anything you create on the heap must be explicitly destroyed by yourself (this is where Valgrind comes in handy).
  • C is not difficult to learn, as there are many tutorials for people from all levels of computer understanding (including those who have NO IDEA where to begin, like here), but greater than average understanding of how computers work (which is almost always covered in anything that teaches C) is needed to really use C to its full extent. It is harder to "get going" in C than MATLAB or Python, and possibly even C++.
  • C++ is a VERY large language. This means C++ is great for large-scale, many-programmers, enterprise-level code, but that there is most likely a large difference between what a software engineer would learn about C++ versus what someone who just wants to do scientific programming would learn. In fact, it is possible that the best scientific programming usage of C++ for certain applications can infringe on "C-style C++", wherein you only use the C++ you need but organize code or do calculations in a simple C style way; this is considered bad practice in software engineering but may be acceptable use in scientific programming, thus diverging how you write code from the skills a programming employer would look for.
    • Regardless of how much C++ you want to actually implement, C++ is difficult to learn to the point where doing simple tutorials like "learn C++ in 10 days" will not teach you to use C++ as it was designed; you really need to invest in one of the good tomes listed at the Stack Overflow thread on the bottom like Accelerated C++ or The C++ Primer. Unless you really really don't care about best practices (and shame on you) and just want enough C++ to get by, you should make a concerted, months-long effort (a little bit at a time) to learn C++ using a well-known good book.
    • Relatedly, because C++ is so large and usage is so spread out, there are actually BAD manuals and tutorials that teach faulty practices. It has been claimed that the C++ Primer Plus series, not to be confused with the C++ Primer series, is negligent in these aspects; this is hearsay, though, so try to look for crowdsourced reviews of different texts, like this best-thread-on-the-internet: The Definitive C++ Book Guide and List. Look at the "Beginner" section of that. See also the list at the end of this page.
  • While C/C++ are great for efficient calculation, for scientific plotting and visualization of data you might as well not even try. Even if C/C++ offerings were adequate in that regard, they would still be greatly inferior to the capability and ease of use of MATLAB's plotting or the "matplotlib" module of Python. Thankfully though, it is not difficult to use C/C++ for simulation, and then have Python/whathaveyou plot the results later in the toolchain; this combination of different languages is very common in science.

C vs. C++

C:

  • Pros:
    • Much easier to learn en totale (for instance, one of the bibles of the language, The C Programming Language, clocks in at 274 pages and is still used), and takes far less time to learn.
    • Since it is less complicated, there is less to get wrong, and thus tutorials are less likely to be incorrect.
    • It is the best at interfacing with Unix/Linux system calls.
    • It works greatly with Python (though C++ can do this to a certain extent now), being the core language of Python.
    • It is very lightweight, and is usually what's chosen for embedded systems or small devices, like Arduinos, which are increasingly popular in science.
  • Cons:
    • Does NOT include OOP, i.e. classes, class methods, inheritance, polymorphism, etc. This enforces simplicity in code organization, but there are many instances in scientific programming where just using OOP simplifies things greatly. C does, however, include structs and custom data types. Thus, in scientific programming it is almost always used alongside a higher-level language.

C++:

  • Pros:
    • Has complete capabilities for OOP, template meta-programming, exception handling, etc. It is capable of far greater abstraction than C.
    • It is easier to make much larger, collaborative programs with C++, e.g. one person designs a class and many other programmers use it. That's not to say C can't do large programs- the Linux kernel is a counterexample.
    • There are some parts of C++ and its standard library, e.g. I/O streams, that are superior to the equivalent in C.
  • Cons:
    • There is a TON to learn about the language, and you won't know whether the language already has some specific functionality that's what you need unless you're familiar with much of the language landscape. A good introductory C++ book can easily clock in at at least 800 pages (that's not to say there aren't good shorter books).
    • In spite of that, unless you're a professional programmer for 20 years, you will likely not use a lot of what C++ has to offer.
    • C++ is not quite as ubiquitous in terms of being able to develop on different machines as C, but only to a small extent.
    • Many people think C++ was designed to do too many things, is too complicated, and is not well-designed (e.g. Linus Torvalds, creator of Linux, has a rant (NSFW language) on why he thinks C is far better than C++), whereas no one makes the same complaints about C.
    • There are so many books written on how to use it that a substantial fraction of them either use poor practices or sometimes are outright incorrect.

Choosing a development environment

  • On Windows, Microsoft's Visual Studio Express (whatever the latest edition is) is a free for everyone IDE, NOT just students. Students can get the full Visual Studio for free. There is also popular Free and Open Source software (FOSS) Bloodshed Dev-C++, which is commonly used by professionals, despite its name.
  • On Mac, Apple's free Xcode IDE is analogous to MS's Visual Studio as the operating system's "main IDE" for developing C, C++, Objective-C, etc. code for desktop/server/mobile/iPhone/iPad applications or whatever applications you want.
  • On Linux, there are many free IDEs like Code::Blocks, Eclipse CDT, NetBeans, the list goes on...
  • Alternatively, Unix, which Mac's OS X uses and Linux is based on and almost identical to, was originally made as a C programming environment! So you don't need an IDE, and can instead use tools that Unix/Linux ALREADY have installed on their operating systems, like the GCC compiler, [Make](https://en.wikipedia.org/wiki/Make_(software) to organize compilation, GDB for debugging, Valgrind for memory debugging and leaks, etc. C/C++ work with the Unix-y command line so well that learning C/C++ is a good excuse for learning just how powerful the command line is!
    • For the Unix/Linux command line, Windows people can download Cygwin, which emulates a Unix-y terminal. Using Cygwin alone on Windows, one can install all the command-line-utilities mentioned above and compile using GCC.
  • No matter what, enable syntax highlighting in whatever text editor you use (IDEs can do this easily in Preferences, if they're not already enabled by default).

Choosing a compiler

  • C++:
    • In general, use the G++ part of GNU Compiler Collection (GCC), as this is one of the most popular and ubiquitous compilers around. Its error messages can be esoteric, and it doesn't integrate well to IDE plugins that try to note warnings and errors in your code, but it is not whacky and code written for it will compile under most compilers. If you are unsure, use this compiler. On Windows, use this via Cygwin.
    • If you are addicted to Visual Studio (VS) in Windows and don't want to use Bloodshed Dev-C++, which uses Mingw's GCC implementation, Visual C++ for VS uses their own compiler that is made for the VS IDE. However, some parts of the code (like even how it defines basic types like some integer type categories) will have to be modified before compiling with a different compiler. Note that if you use VS' compiler, it will be difficult, if not impossible, for anyone who doesn't use it (say, people using Mac or Linux) to compile your code.
    • Apple's Xcode uses Clang++/llvm-clang++, which is a fairly new one that is gaining traction. In the coming years it may be a serious contender with GCC, but for right now it is not very popular in the scientific programming community.
  • C:
    • Use the GCC part of GCC as mentioned above.
    • Do not use Microsoft's Visual Studio for C compilation, as it is not standards-compliant going back to 1999's C99 and possibly before.
    • Clang (without the ++) is a growing contender for a C compiler.

Best Practices

  • Both C and C++:

    • Of course, follow all the best practices of general programming. Especially important is to plan beforehand for twice as long as you plan to spend implementing.
    • In addition to naming your variables well, both languages support custom data structs, so use them to organize your variables! E.g., if you have different neuron classes with many different parameters, use a single struct to organize all that celltype's parameters, like PY.g_Na for struct PY.
      • If you want to add a new data member to a certain type, keep track either in your head, through searching multiple files with regular expressions, or an IDE, of where to declare, what declarations/assignments require the member name, what require it to be initialized, and where it needs to be destroyed if necessary.
    • Take the time to learn pointers, and how to use them with arrays; this is especially important if coming from MATLAB or Python. Think of C/C++'s preference for allocating memory at compile-time (literally, the time at which you compile your program) in arrays as similar to pre-allocating large chunks of memory prior to using them for MATLAB.
    • Always initialize any data that is declared before using it. Undefined behavior (that's programming-language speak for Very Bad Things Possibly) can happen when accessing certain types when they're uninitialized.
    • If you expect to only change a small number of values over the course of many simulations/program runs, make your program accept inputs into the program itself by learning to use main(int argc, char *argv[]) in the function call of your main program. Do not be intimidated by this standard; argc is merely the number of arguments you're supplying to the computer to run with this program (including the title of the program itself), aka the size of argv, and argv is an array of strings, each of which are the arguments (including the title of the program itself) supplied when you RUN the program.
    • Use #include <ctime> (C++) or #include <time.h> (C) to measure in computer clock cycles how fast portions of your code run. This is also useful in debugging.
    • Use std::cout (C++) and its C equivalent copiously for easy debugging and tracking of values if you haven't already invested time in learning GDB or other debuggers.
      • This is especially useful when doing complicated pointer assignments, as un-dereferenced pointers, aka raw pointers, will print to standard output their memory addresses exactly.
    • If you are not going to use something later in the C/C++ program, but it is some result of a simulation/calculation you want to keep, write it to file as new/later values of that data are gradually created rather than storing it all in memory and then saving all the data at the end. This minimizes memory usage, and you will be writing it to file eventually anyways.
    • Always double check that you destroy memory you have allocated on the heap using new!!!
    • If you use control flow in determining how you compile, learn to use Make, as this works with most compiled languages.
    • In your headers, organize your function prototypes separately from your data structure declarations.
    • Rarely seen in scientific programming, make constant using const nearly all the data members that will be constant and never changed. There will of course be a lot of parameters that will be modified and thus this won't apply to, but this is so you aren't allowed to make a change to a parameter somewhere and then forget about it. Real software engineering is much more demanding on this point.
    • Whatever compiler you use, tell the compiler you want it to tell you whenever there's a warning.
      • Treat warnings (which do not stop your program from compiling) as errors (which do), in that you should fix them immediately. Some compilers give you the option to tell it to treat warnings as errors, but enabling that is only for the hardest of core.
      • Despite C++'s allowance of exception handling, do your absolute best to avoid using them. They should be treated similarly to MATLAB's evil eval: they are almost never the best solution, and if you really think using them is the best solution, then consider reorganizing the relevant code. Still, sometimes they are the best solution.
  • C++ specifically:

    • If you don't know the size of a data vector/list until runtime (literally, the time at which you run the program, after it's been compiled), and
      • if you're not going to access it very often, use an std::vector type from the standard library. These are containers that are made to be dynamically (meaning during runtime) expanded/changed in size, and in a lot of cases are optimized to be almost as fast as arrays if they're not used that often in the code.
      • if you're going to be accessing it very often, and especially if you're going to be accessing it contiguously (i.e. going down the list), construct a dynamic array on the heap using (this is an example of a MULTIDIMENSIONAL dynamic array) something like

    matrix = new double* [dim_size_one]; for (i = 0; i < dim_size_one, i++) { matrix[i] = new double [dim_size_two]; for (j = 0; j < dim_size_two, j++) { matrix[i][j] = sin(i * j); } }

      - In general software engineering, this is NOT best practice, as it is trying to solve a problem (containers of variable size at runtime) that is already meant to be solved by vectors. However, if you are going for speed, and in spite of the multidimensionality, if you are going down the elements of that array of pointers to arrays consecutively in memory, then this should offer better speed.
    
    • If you don't plan on your program becoming large, use using namespace std outside of the main() call for ease of use of things like cout. This is considered bad practice in general software engineering, as it brings in possible namespace conflicts, but for scientific programming where most of your variables are scientific-paradigm-specific, it shouldn't be a problem.
    • In addition to the general programming Rule(s) of Three, there is an [additional one for C++ classes](https://en.wikipedia.org/wiki/Rule_of_three_(C%2B%2B_programming), where if you must define either a destructor, a copy constructor, or a copy assignment operator for a class, you should probably define all three.
    • Resource Acquisition Is Initialization (RAII) is a C++ specific idiom which says, if you're using memory resources, you should obtain them when you initilize an object (in the OOP sense) so that you can automatically destroy them when the object's destructor is run. This is because object destructors are the only things definitely run when an exception is thrown.

External Resources

C

Books for learning C, and as references for C

The end-all, be-all list for C: http://stackoverflow.com/questions/562303/the-definitive-c-book-guide-and-list One of the best: The C Programming Language

Tutorials on C

http://www.computerscienceforeveryone.com/ (This is the best if you have NO programming experience) http://c.learncodethehardway.org/book/

C++

Books for learning C++, and as references for C++

It is highly recommended that C++ be learned through reliable books and not brief tutorials for reasons stated above. The end-all, be-all list for C++: http://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list One of the best: C++ Primer, 5th edition

C++ for scientific programming

http://wbell.web.cern.ch/wbell/HepCppIntro/HepCppIntroGuide-2009-06-03.pdf http://www.cs.indiana.edu/pub/techreports/TR542.pdf

The writer of this does in no way claim to be a C/C++ expert.
Clone this wiki locally