Archive for the 'C' Category

Is C a vitamin?

… or Problems of C Programming Language

In my university, students are thought C as the first programming language - not only computer engineering program’s students, but also some of other engineering programs’ students. Although I had always liked writing code in C programming language, there are some reasons that why I think it is not a proper language to teach the concept of programming or even it is not a proper language to write most of the programs.
According to Wikipedia, C is a general-purpose, procedural, imperative computer programming language developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system [5]. Sure it is procedural and imperative, but talking about purpose of language, though we must categorize C as a general-purpose language, who can advocate correctness of implementing every problem in C? C was originally designed to be a portable assembly language for easier implementation of UNIX [2]. Thus, it is a low-level programming language to use when hardware-intended or performance-critical software is needed.
I will mention about some problems or pseudo-problems of C programming language in this article. A very wide criticism of C programming language may be found at [6].

Readability

We do not need to say a lot about C and readability: C is not readable at all - don’t forget it’s a kind of enhancement to assembly language. Let us look at an example (example is taken from [1]). Have a look at code listings 1 and 2. First one is an implementation of a list of integers in C and second is Python equivalent of same code. We are not interested in how much readable Python code is, but we are interested in how much hard to read C code.
While writing code in C, you have to help compiler a lot. For favor of compiler you put braces ({ and }) to represent blocks. Stars here and stars there confuses the programmer (and then after code-reader). When you want to use some memory you have to allocate it and make sure all pointers are set to NULL initially. Also when you want to update the list, you must take care of all pointers again. Code becomes a whole mess full with statements those are not related with real problem and anyone who want to read and understand code finds (her/him)self inside this mess. (S)he has to fight with all these stars and braces and allocation statements and etc.
Have a look at table under Expressiveness section of [4]. According to this table you have to write 6 lines of C code to do same job with a single line of Python or Perl code. Also, lines of code needed to implement a problem in C is 2.5 times of lines of code to implement same problem in Java or Fortran. Thus, we can say C is not an efficient language for either writing or reading code.

Orthogonality

Have a look at two function definitions in code listing 3 (example is taken from [1]). The first function (double_int) takes an integer parameter and doubles its value and saves inside a local variable. After that value of local variable is returned. Second function (double_str) takes a character pointer as a parameter and doubles its value inside a local variable using some library functions. Like the first function, at last, value of local variable is returned back. But, second function cannot be even compiled because we are trying to return a local pointer variable.
Programmer occasionally does not have chance to write a similar code for two similar instances of same problem in C. For this reason, C is said to be not orthogonal.

Safety

C code tends to need more maintenance because of its unsafe properties. Some of these properties are:

  • Though type checking is done by compiler, programmer is free to make type casting.
  • C never and never makes index range checking. It is claimed that C encourages buffer overflows with this property in [2]. Same source gives a list of functions that may cause buffer overflow accidentally.
  • Programmer has the memory! No safety check is done about memory allocation.
  • You must not free same pointer twice - again accidentally, if you do you are in trouble. There is no internal mechanisms to avoid this. Programmer must always check the pointer while allocating and freeing memory.

Redundancy

[3] says “In many ways, the C language evolved into a collection of overlapping features, providing too many ways to say the same thing, while in many cases not providing needed features.[2] has a lot of examples about that, I will not rewrite all of them. Just consider gets and fgets functions. They both do same job (OK, they don’t, but fgets does what gets can do) one is not encouraged due to some safety issues. But like a lot of similar things it cannot be fully thrown out of language, because backward compatibility is needed.

String type

In [2], James A. C. Joyce wrote about strings in C:

Most sane programming languages have a string type which allows one to just say “this is a string” and let the compiler take care of the rest. Not so with C. It’s so stubborn and dumb that it only has three types of variable; everything is either a number, a bigger number, a pointer or a combination of those three. Thus, we don’t have proper strings but “arrays of unsigned integers”. “char” is basically only a really small number. And now we have to start using unsigned ints to represent multibyte characters.

Since C has no string type you cannot do a string copying or string concetenation operation via its own syntax and you have to do this with help of functions. Hence, string operations are not a part of language, they are library functions and you must include appropriate header file to use them (string.h in ANSI C). Assignment of array variables is also not allowed inside C code, bacause an array is nothing but just a pointer. Also number to string or string to number conversion can be only done via functions. Look at those two syntax types - no comments.
C style:

strncpy(source + 2, target, 5);

Python style:

target = source[2:7]

Besides all, C is a low-level programming language. It is not a string manipulation language. Expecting high-level string manipulation operations from it is not reasonable.

Reaching elements

Another complaint about C is why do we have both . and -> to use for the same purpose. Firstly, they are not for the same purpose. . is structure offset and -> is used for dereferencing. Sure we should expect from compiler to take care of this difference or we can simply do not want to help the compiler to ease its job. But in my opinion understandability of code increases with this small difference. Looking at the code we can easily see what is a pointer and we are dereferencing its indigrents and what is a name of a real structure. Let me say I never liked Java way of avoiding pointers.

goto statement

James A. C. Joyce claims that using a goto statement is the only way of breaking out of nested for or while loops in [2]. I am not sure his claim is true but I don’t know a more efficient way of doing this either. [3] says 90% of goto statements used to break out of nested loops by investigated 100,000 lines of code. A number of other languages uses multi-level breaking to avoid this.
Perhaps goto statement is the worst feature of C language. You can go somewhere in a loop / nested loops using it. You can make your code ten times hard to read using a single goto statement inside a loop.

enums

C’s enum structure has a very significant and important problem which may be easily solved with object-oriented programming. If you use a name in an enum, you cannot use this name in another enum. If you use an object-oriented programming language like Java, you may put the same constant name inside different classes.

Error handling

Error handling may be done (is done by library functions) in the following ways (complete list is taken from [2]):

  • Returning zero
  • Returning nonzero
  • Returning a NULL pointer
  • Setting errno
  • Requiring a call to another function
  • Outputting a diagnostic message to the user

There is no exception handling mechanism in C. This results in two critical problems: First is what we know already: C is not appropriate for high-level programming. Secondly, since there is no single error code convension programmer gets confused when writing code.

What C doesn’t have

  • Exception handling mechanism
  • Specialized data types
  • Function overloading
  • Garbage collection

A nice joke :) [2]

“Hey, Thompson, how can I make C’s syntax even more obfuscated and difficult to understand?”
“How about you allow 5[var] to mean the same as var[5]?”
“Wow; unnecessary and confusing syntactic idiocy! Thanks!”
“You’re welcome, Dennis.”

Conclusion

C is not the evil in this story, I think. Just, it is not really proper for high-level programming. It must be used for what it is designed for.
Some bloody properties cannot be abandoned due to backward compatibility. Newbies always discouraged using these features by experienced programmers.
While coding in C, I feel myself in the middle of 70s while memory was so important and that I shouldn’t use a single byte if I really don’t need it. This makes me sick about C :)
I think it is something related with comfort and habits. For example Fortran programmers have complaints about ability to change the loop variable inside the loop, but I cannot even dream about a world where I cannot change it :) Java programmers find pointers confusing, I am confused when I don’t see those stars inside code :)
To sum up, C or its features are not real problem, problem is using it where not to use. Never forget: C is not a vitamin that is useful in every condition, it is a low-level language!

References

[1] http://www.ce.itu.edu.tr/undergraduate/courses/blg437e/presentations/introduction.pdf : Programming Languages - Introduction by H. Turgut Uyar
[2] http://www.kuro5hin.org/story/2004/2/7/144019/8872 : Why C Is Not My Favourite Programming Language by James A. C. Joyce
[3] http://java.sun.com/docs/white/langenv/Simple.doc2.html : The Java Language Environment
[4] http://en.wikipedia.org/wiki/Comparison_of_programming_languages : Comparison of programming languages
[5] http://en.wikipedia.org/wiki/C_%28programming_language%29 : C (programming language)
[6] http://en.wikipedia.org/wiki/Criticism_of_the_C_programming_language : Criticism of the C programming language

Code listings

[1] http://www.ozgurmacit.com/files/Is-C-a-vitamin-01.c
[2] http://www.ozgurmacit.com/files/Is-C-a-vitamin-02.py_
[3] http://www.ozgurmacit.com/files/Is-C-a-vitamin-03.c


About

You are currently browsing the Özgür Macit weblog archives for the C category.

Longer entries are truncated. Click the headline of an entry to read it in its entirety.

Categories