Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
No more confusions on tricky C declarations (c-faq.com)
132 points by ashishb4u on June 26, 2010 | hide | past | favorite | 41 comments


I've always had a bit more luck with the "typedef each step of the construction" rule-of-thumb. Also, I tend to hide anything as complex as "pointer-to-array-of-pointers-to-functions" (even though you memorize this idiom pretty quickly after an hour in the kernel) behind library ADT's, so you're never indexing an array, but rather passing an index and a whatever_t* to whatever_get(w, index).


That's a good approach, but sometimes you can't do it in C++. (At least with C++03.) Consider:

  template <class T>
  T* new_align_1d(size_t d1, unsigned int align)
  {
    void* ptr = _malloc_align(d1 * sizeof(T), align);
    return new (ptr) T[d1];
  }
So, I'm defining a function new_align that takes a size and an alignment. I allocate enough space for a 1-dimensional array of the given size on that alignment, then use the placement operator to construct the actual objects in that place. Then return the pointer. Pretty straightforward. So let's generalize one dimension up:

  template <class T, size_t d2>
  ? new_align_2d(size_t d1, unsigned int align)
  {
    void* ptr = _malloc_align(d1 * d2 * sizeof(T), align);
    return new (ptr) T[d1][d2];
  }
You can probably see what I'm doing here. So what's the return type? And what's the syntax for specifying it? Since it's a template, we can't define a typedef to help us. (C++0x should allow that with parameterized typedefs.) The answer surprised me - I really had to look at the grammar and mechanically figure out what it was going to be.

(Yes, I have to pass d2 as a template parameter - it's part of the type of the array and must be known at compile time.)


The fact that I instantly recognized this problem case (having had it many times myself) makes me sad.

I'm so incredibly happy that 95% of the code I write now is in Python.


This is a Good Idea, but it doesn't help you understand other people's code if they don't follow it. Hence the need for a mnemonic rule.


A coworker introduced me to the ADT-style solution for nesting problems, and it works nicely. He introduces a struct that encapsulates any nested inner collection. It's worth noting his designs use fat structs in a shallow hierarchy.

This has an added advantage (and burden) when calling the code. Since you access a field inside of the struct, you write extra names. The claim is that myList[0].fnList[0].fn(3) is more readable than myList[0][0](3). This can get cluttered with lots of nesting, but this kind of nesting usually screams "refactor me!" anyways.

I prefer a mix: make the common inner collections into data types, and typedef the rest. Doing it all the time creates too many little structs for my liking.


The second approach, that's what I like...


An ML like notation would be even more cool:

  char *str[10];
  str : [10] (*char)

  char *(*fp)( int, float *);
  fp : *((int, *float) -> *char)

  void (*signal(int, void (*fp)(int)))(int);
  signal : (int, *(int -> void)) -> *(int -> void)
(Oh. That last declaration did make some sense, after all…)

Really, how did they manage to chose such an inconsistent, unreadable syntax for their declarations? Is there any rational explanation?


The rationale is that the type declaration demonstrates how you use the variable. For example:

  char *(*fp)(int, float *)
You now have a variable called "fp". Follow that pointer by putting a * in front of it. Call that function by passing it an int and a float pointer inside parentheses. Follow that pointer by putting a * in front of it. That gives you a char.

Same kind of thing here:

  char *strings[10]
You now have a variable called "strings". Index that array by putting an offset less than 10 inside square brackets. Follow that pointer by putting a * in front of it. That gives you a char.

Here's a simpler example:

  char *str
You now have a variable called "str". Follow that pointer by putting a * in front of it. That gives you a char.

Here's the simplest example of all:

  char ch;
You now have a variable called "ch". That gives you a char.


AIUI, the explanation is that they wanted this idea of "declaration follows use". For simple things, this works kind of nicely:

    char *foo;
Says that *foo will have type `char`. This sounds good in theory, but famously breaks down when things get more complicated (arrays of function pointers, multiple declarations at once, &c.).


Yes, see van der Linden's Deep C Secrets:

http://books.google.com/books?id=4vm2xK3yn34C&printsec=f...


This was on HN a bit ago, and you reminded me of it: http://www.csse.monash.edu.au/~damian/papers/HTML/ModestProp...

Basically a fully-fleshed out version of your idea.


Fails on nested arrays.

  char *foo[10][20];
The method described would indicate that this is a array 10 of pointers to array 20s of chars.

This is incorrect. It is an array 10 of array 20s of pointers to chars.


How so?

         +-----------+
         | +-+       |
         | ^ |       |
    char *foo[10][20];
     ^   ^   |       |
     |   +---+       |
     +---------------+
* foo is

* an array of ten arrays of 20

* pointers to

* char


Following the procedure as written:

         +-------+ 
         | +-+   |
         | ^ |   |
    char *foo[10][20];
      ^  ^   |   |
      |  +---+   |
      -----------+
It mentions, in rule 1, handling tokens of the form [] or [X], but not multiple occurrences of these.

This is indeed rather nitpickish of me, but on the other hand, one thing that you at least need to know in that case is in what order you read the numbers - left-to-right, or right-to-left (is foo[10][20] a 10-array of 20-arrays or a 20-array of 10-arrays?). Not mentioning this leaves one (at least, it left me) with the impression that multiple occurrences were already handled by this rule as written, which is, as demonstrated, false.



"So is somebody gonna create a tool that uses this parser so you can declare your C in English?" -My roommate


this is fantastic, but apparently "void (signal(int, void (fp)(int)))(int);" is a syntax error?


It is not, but their implementation is incomplete. They parse it correctly when you remove "fp". Apparently, they don't handle declarations which have another identifiers besides the one that is declared.

  int f(int  ) // OK
  int f(int i) // fail


I don't see where the spiral comes in. The following rule is simpler:

(1) Begin at the variable name, read from left to right, then go back to the name and read from right to left.

(2) Give precedence to expressions in parentheses.

For example: char (fp)( int, float * )

The innermost expression is (* fp). Nothing to the right of the fp. so read to the left: "* ", it's a pointer. Next, we go right and see an arguments list, so it's a pointer to a function taking these arguments. Go back to where we started and read right to left: Pointer to a function taking (int,float *) that returns a pointer to char.


left-to-right and right-to-left is spiral infact :)


    char* str[10]; /* Better */


I disagree, the following:

  int* foo, bar;
could be interpreted as declaring two pointers, whereas:

  int *foo, bar;
makes it clear that only the first variable is a pointer.


I don't think you should be declaring mixed types with one statement. Declare a pointer, then declare the `int` separately -- and the mystery is gone.


I believe the point is that, with that syntax, you can't declare multiple same-type pointers on the same line


I find that problem goes away if you only declare a single variable per line. Generally, I find people who do

  int* foo;
Started with C++ and people who do

  int *foo;
Started with C. I started with C++.


I have made the same observation and as someone who started with C++, I think it makes more sense that

    int* i;
is an int pointer named i.

    int *i;
just looks strange.. an int named pointer i umm?

Besides, declaring only one variable per line seems less cluttered to me, so I never get the

    int* a, b;
confusion.


I never put the asterisk next to the type, because I wanted to train myself the way the grammar actually works rather than the way almost everyone wished it worked.


But the compiler recognizes the type of the variable as int*.


When used in an expression, I read p as "what's pointed by p". In a declaration, this applies as well:

    int *i;
means "what's pointed by i is an int".


That makes more sense alright, but I still think it would make a lot more sense if * was bound to the type.


Declaring two different types on the same line is fraught with confusion, so it is to be avoided.


I've always found putting the * next to the type instead of the variable, like this, more intuitive. Does anyone know why the other way is used more often? Is it just historic, or is there a more practical reason?


As said elsewhere: "Declaration follows use".

    int *foo;
means "foo is a pointer to int". But that's explained different from how it's written. More clearly, you can say that "foo" is an int. Then, doing

    int *foo;
    foo = 50;
is obviously wrong because "foo" is not an int; however

    int *foo;
    *foo = 50;
is correct.

Similarly for arrays:

    int *foo[50];
means "foo is an array of pointers to int", or "
foo[5] is an int".

From there follows that & is the antithesis of , and they negate each other:

    int *foo; //foo is a pointer to int.
    &*foo; //the address of the contents of foo
    foo; // same as above, but shorter.


Sorry, that didn't make much sense to me - in your first example, you can't say "foo is an int", because it's not, it's a pointer to an int. And in your last example, you can't say "foo[5] is an int", because it's not, it's a pointer to int.

In response to the original question, I think the answer is that

  int *foo;
is a common way of writing it because that's the way the compiler resolves it. As mentioned elsewhere,

  int* foo, bar;
is equivalent to

  int *foo; int bar
so treating the star as part of the type can cause problems.

But I agree fully - it makes a lot more sense to me to consider the pointer star as a flag on the type, not a modifier to the name. Just one of the many warts on C and C++ that make me happy I have to use them so infrequently...

[Edit: formatting, stars were getting swallowed when put inline]


> [Edit: formatting, stars were getting swallowed when put inline]

The same thing happened to the post you're responding to, and that's why it's not making sense to you. The author really meant to say star-foo and star-foo[5], but instead italicized a bunch of text in between.


D'oh.

Don't know how I didn't realize that, given that my response got messed up so that my "corrected" version read exactly like the one I was confused about...

Please disregard everything I said, in that case. :)


Meh! This is just an overly complication of the right-left rule, which makes you think about "complex" 2D geometry instead of a couple of simple spatial pointers in the declaration you are trying to parse..

The easier, more useful version: http://ieng9.ucsd.edu/~cs30x/rt_lt.rule.html


It is amazing to me the number of people who would take the time to make ASCII graphics in their replies.

I'm blessed that even munging 15+ Linux c/c++ libraries lately I haven't run into anything requiring this - though my Intro to C class, at Merit College twenty five years ago did teach to this rule.

Hats off to you anyway...


No problems here, that is, with D's right-to-left declaration syntax. ;)


yes this is the kind of thread I expect to see @WalterBright chime in on :)


Sure helps to read code faster!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: