No more confusions on tricky C declarations

tptacek · on June 26, 2010

I've always had a bit more luck with the "typedef each step of the construction" rule-of-thumb. Also, I tend to hide anything as complex as "pointer-to-array-of-pointers-to-functions" (even though you memorize this idiom pretty quickly after an hour in the kernel) behind library ADT's, so you're never indexing an array, but rather passing an index and a whatever_t* to whatever_get(w, index).

scott_s · on June 26, 2010

That's a good approach, but sometimes you can't do it in C++. (At least with C++03.) Consider:

  template <class T>
  T* new_align_1d(size_t d1, unsigned int align)
  {
    void* ptr = _malloc_align(d1 * sizeof(T), align);
    return new (ptr) T[d1];
  }

So, I'm defining a function new_align that takes a size and an alignment. I allocate enough space for a 1-dimensional array of the given size on that alignment, then use the placement operator to construct the actual objects in that place. Then return the pointer. Pretty straightforward. So let's generalize one dimension up:

  template <class T, size_t d2>
  ? new_align_2d(size_t d1, unsigned int align)
  {
    void* ptr = _malloc_align(d1 * d2 * sizeof(T), align);
    return new (ptr) T[d1][d2];
  }

You can probably see what I'm doing here. So what's the return type? And what's the syntax for specifying it? Since it's a template, we can't define a typedef to help us. (C++0x should allow that with parameterized typedefs.) The answer surprised me - I really had to look at the grammar and mechanically figure out what it was going to be.

(Yes, I have to pass d2 as a template parameter - it's part of the type of the array and must be known at compile time.)

apu · on June 27, 2010

The fact that I instantly recognized this problem case (having had it many times myself) makes me sad.

I'm so incredibly happy that 95% of the code I write now is in Python.

rntz · on June 27, 2010

This is a Good Idea, but it doesn't help you understand other people's code if they don't follow it. Hence the need for a mnemonic rule.

jakevoytko · on June 26, 2010

A coworker introduced me to the ADT-style solution for nesting problems, and it works nicely. He introduces a struct that encapsulates any nested inner collection. It's worth noting his designs use fat structs in a shallow hierarchy.

This has an added advantage (and burden) when calling the code. Since you access a field inside of the struct, you write extra names. The claim is that myList[0].fnList[0].fn(3) is more readable than myList[0][0](3). This can get cluttered with lots of nesting, but this kind of nesting usually screams "refactor me!" anyways.

I prefer a mix: make the common inner collections into data types, and typedef the rest. Doing it all the time creates too many little structs for my liking.

joe_the_user · on June 27, 2010

The second approach, that's what I like...

loup-vaillant · on June 26, 2010

An ML like notation would be even more cool:

  char *str[10];
  str : [10] (*char)

  char *(*fp)( int, float *);
  fp : *((int, *float) -> *char)

  void (*signal(int, void (*fp)(int)))(int);
  signal : (int, *(int -> void)) -> *(int -> void)

(Oh. That last declaration did make some sense, after all…)

Really, how did they manage to chose such an inconsistent, unreadable syntax for their declarations? Is there any rational explanation?

fexl · on June 27, 2010

The rationale is that the type declaration demonstrates how you use the variable. For example:

  char *(*fp)(int, float *)

You now have a variable called "fp". Follow that pointer by putting a * in front of it. Call that function by passing it an int and a float pointer inside parentheses. Follow that pointer by putting a * in front of it. That gives you a char.

Same kind of thing here:

  char *strings[10]

You now have a variable called "strings". Index that array by putting an offset less than 10 inside square brackets. Follow that pointer by putting a * in front of it. That gives you a char.

Here's a simpler example:

  char *str

You now have a variable called "str". Follow that pointer by putting a * in front of it. That gives you a char.

Here's the simplest example of all:

  char ch;

You now have a variable called "ch". That gives you a char.

endgame · on June 26, 2010

AIUI, the explanation is that they wanted this idea of "declaration follows use". For simple things, this works kind of nicely:

    char *foo;

Says that *foo will have type `char`. This sounds good in theory, but famously breaks down when things get more complicated (arrays of function pointers, multiple declarations at once, &c.).

mturmon · on June 27, 2010

Yes, see van der Linden's Deep C Secrets:

http://books.google.com/books?id=4vm2xK3yn34C&printsec=f...

jerf · on June 27, 2010

This was on HN a bit ago, and you reminded me of it: http://www.csse.monash.edu.au/~damian/papers/HTML/ModestProp...

Basically a fully-fleshed out version of your idea.

rntz · on June 26, 2010

Fails on nested arrays.

  char *foo[10][20];

The method described would indicate that this is a array 10 of pointers to array 20s of chars.

This is incorrect. It is an array 10 of array 20s of pointers to chars.

tordek · on June 27, 2010

How so?

         +-----------+
         | +-+       |
         | ^ |       |
    char *foo[10][20];
     ^   ^   |       |
     |   +---+       |
     +---------------+

* foo is

* an array of ten arrays of 20

* pointers to

* char

rntz · on June 27, 2010

Following the procedure as written:

         +-------+ 
         | +-+   |
         | ^ |   |
    char *foo[10][20];
      ^  ^   |   |
      |  +---+   |
      -----------+

It mentions, in rule 1, handling tokens of the form [] or [X], but not multiple occurrences of these.

This is indeed rather nitpickish of me, but on the other hand, one thing that you at least need to know in that case is in what order you read the numbers - left-to-right, or right-to-left (is foo[10][20] a 10-array of 20-arrays or a 20-array of 10-arrays?). Not mentioning this leaves one (at least, it left me) with the impression that multiple occurrences were already handled by this rule as written, which is, as demonstrated, false.

jeffmax · on June 26, 2010

http://cdecl.org/

javert · on June 26, 2010

"So is somebody gonna create a tool that uses this parser so you can declare your C in English?" -My roommate

exit · on June 26, 2010

this is fantastic, but apparently "void (signal(int, void (fp)(int)))(int);" is a syntax error?

loup-vaillant · on June 26, 2010

It is not, but their implementation is incomplete. They parse it correctly when you remove "fp". Apparently, they don't handle declarations which have another identifiers besides the one that is declared.

  int f(int  ) // OK
  int f(int i) // fail

Amnon · on June 26, 2010

I don't see where the spiral comes in. The following rule is simpler:

(1) Begin at the variable name, read from left to right, then go back to the name and read from right to left.

(2) Give precedence to expressions in parentheses.

For example: char (fp)( int, float * )

The innermost expression is (* fp). Nothing to the right of the fp. so read to the left: "* ", it's a pointer. Next, we go right and see an arguments list, so it's a pointer to a function taking these arguments. Go back to where we started and read right to left: Pointer to a function taking (int,float *) that returns a pointer to char.

ashishb4u · on June 26, 2010

left-to-right and right-to-left is spiral infact :)

rue · on June 26, 2010

    char* str[10]; /* Better */

gmartres · on June 26, 2010

I disagree, the following:

  int* foo, bar;

could be interpreted as declaring two pointers, whereas:

  int *foo, bar;

makes it clear that only the first variable is a pointer.

godDLL · on June 26, 2010

I don't think you should be declaring mixed types with one statement. Declare a pointer, then declare the `int` separately -- and the mystery is gone.

masklinn · on June 27, 2010

I believe the point is that, with that syntax, you can't declare multiple same-type pointers on the same line

scott_s · on June 26, 2010

I find that problem goes away if you only declare a single variable per line. Generally, I find people who do

  int* foo;

Started with C++ and people who do

  int *foo;

Started with C. I started with C++.

dkersten · on June 27, 2010

I have made the same observation and as someone who started with C++, I think it makes more sense that

    int* i;

is an int pointer named i.

    int *i;

just looks strange.. an int named pointer i umm?

Besides, declaring only one variable per line seems less cluttered to me, so I never get the

    int* a, b;

confusion.

prodigal_erik · on June 27, 2010

I never put the asterisk next to the type, because I wanted to train myself the way the grammar actually works rather than the way almost everyone wished it worked.

scott_s · on June 27, 2010

But the compiler recognizes the type of the variable as int*.

rbonvall · on June 27, 2010

When used in an expression, I read p as "what's pointed by p". In a declaration, this applies as well:
int *i;
means "what's pointed by i is an int".

dkersten · on June 27, 2010

That makes more sense alright, but I still think it would make a lot more sense if * was bound to the type.

rue · on June 27, 2010

Declaring two different types on the same line is fraught with confusion, so it is to be avoided.

milod · on June 26, 2010

I've always found putting the * next to the type instead of the variable, like this, more intuitive. Does anyone know why the other way is used more often? Is it just historic, or is there a more practical reason?

tordek · on June 27, 2010

As said elsewhere: "Declaration follows use".

    int *foo;

means "foo is a pointer to int". But that's explained different from how it's written. More clearly, you can say that "foo" is an int. Then, doing
int *foo; foo = 50;
is obviously wrong because "foo" is not an int; however
int *foo; *foo = 50;
is correct.
Similarly for arrays:
int *foo[50];
means "foo is an array of pointers to int", or "foo[5] is an int".

From there follows that & is the antithesis of , and they negate each other:

    int *foo; //foo is a pointer to int.
    &*foo; //the address of the contents of foo
    foo; // same as above, but shorter.

ewjordan · on June 27, 2010

Sorry, that didn't make much sense to me - in your first example, you can't say "foo is an int", because it's not, it's a pointer to an int. And in your last example, you can't say "foo[5] is an int", because it's not, it's a pointer to int.

In response to the original question, I think the answer is that

  int *foo;

is a common way of writing it because that's the way the compiler resolves it. As mentioned elsewhere,

  int* foo, bar;

is equivalent to

  int *foo; int bar

so treating the star as part of the type can cause problems.

But I agree fully - it makes a lot more sense to me to consider the pointer star as a flag on the type, not a modifier to the name. Just one of the many warts on C and C++ that make me happy I have to use them so infrequently...

[Edit: formatting, stars were getting swallowed when put inline]

lmkg · on June 27, 2010

> [Edit: formatting, stars were getting swallowed when put inline]

The same thing happened to the post you're responding to, and that's why it's not making sense to you. The author really meant to say star-foo and star-foo[5], but instead italicized a bunch of text in between.

ewjordan · on June 28, 2010

D'oh.

Don't know how I didn't realize that, given that my response got messed up so that my "corrected" version read exactly like the one I was confused about...

Please disregard everything I said, in that case. :)

MtL · on June 27, 2010

Meh! This is just an overly complication of the right-left rule, which makes you think about "complex" 2D geometry instead of a couple of simple spatial pointers in the declaration you are trying to parse..

The easier, more useful version: http://ieng9.ucsd.edu/~cs30x/rt_lt.rule.html

joe_the_user · on June 27, 2010

It is amazing to me the number of people who would take the time to make ASCII graphics in their replies.

I'm blessed that even munging 15+ Linux c/c++ libraries lately I haven't run into anything requiring this - though my Intro to C class, at Merit College twenty five years ago did teach to this rule.

Hats off to you anyway...

AndrejM · on June 26, 2010

No problems here, that is, with D's right-to-left declaration syntax. ;)

mkramlich · on June 26, 2010

yes this is the kind of thread I expect to see @WalterBright chime in on :)

ashishb4u · on June 26, 2010

Sure helps to read code faster!