c

Trimming spaces and comments in C

This is a simple C function that reads a line from a file and trim all comments and spaces.
You can see that the function receives a pointer to a file, a pointer to a char and an integer holding the size of the line to be read from the file.
The function then returns a line from the file without spaces(leading/trailing) and comments

Function signature:

 
 char * readLine (FILE* fp, char* line, int size)  

One thing I want to point out is the fact that it is possible to trim the string without creating a new one.
By using pointer arithimetics you can manipulate the chars of the line and remove anything you want.

For example, lets say you have a string like this:

*I’m using 0x00n just as an example to demonstrate the memory location of each char.

So the string A1 has two leading and trailling spaces.
Assuming that the line read from the file is: ” a1 ”

1.

First we remove the comments, none is this case:

   
 s = strchr(line, ‘#’);  
 if (s) *s = ‘\0';  

We use strchr to search the string for any occurrences of #

If # is found in the string we set the null byte to the position of the first #

2.

The next step is to remove all the trailing spaces:

 
 s = line + strlen(line) – 1;  
 while (isspace(*s)) s–;  
 *(s+1) = ‘\0';  
  • First we assign to s the position of the last char in the line string.

  • Second we check if the char is a space using the isspace function(it checks not only for spaces, but for other delimiters as well). If the char is a space we subtract one from the s, meaning we subtract a char from s, setting s to point to one char before.

  • Once we find a char that’s not a space we break the loop.

  • Finally, we add one to s, and set the null byte to the position of the first space after the string ends.

3.

To remove leading spaces is even simpler:

 
 s = line;  
 while (isspace(*s)) s++;  

We set s to point to the first char in the string read from the file.
Then we loop through the string checking if the char is a space and incrementing the pointer by 1.

After all the trimming, s will point to the first non space char in the string, and the null byte will be positioned right after the last non space char. It will also have all the comments removed.

One thing to notice with this approach, is that the function must receive a char* and return a char.
The reason beginning is that the char
needs to be declared in the function that is calling readLine, in this example the main. Since if not, the scope of the char* would be tight to the readLine function and thus the calling function would not be able to access the trimmed string.

Another possibility could be to manipulate the char* line itself, removing the necessity to return a new char.

This trim function could be adapt to not only work with files, but with different data structures as well.

If you have any suggestions or tips please leave a comment bellow :D

Full Source Code:

  
 #include  
 #include  
 #include

char *  
 readLine (FILE* fp, char* line, int size)  
 {  
 char* s = NULL;

 while (!feof(fp) && fgets(line, size, fp)) {  
 // Strip comments  
 s = strchr(line, ‘#’);  
 if (s) *s = ‘\0';

 // Remove trailling spaces  
 s = line + strlen(line) – 1;  
 while (isspace(*s)) s–;  
 *(s+1) = ‘\0';

 // Remove leading spaces  
 s = line;  
 while (isspace(*s)) s++;

 // Don’t return empty lines  
 if (*s) return s;  
 printf("empty linen");  
 }

 return NULL;  
 }

int  
 main (void)  
 {  
 FILE* fp = NULL;  
 char line[256];  
 char* s = NULL;  
 fp = fopen("file", "r");  
 if (!fp) return 1;  
 while ((s = readLine(fp, line, sizeof(line)))) {  
 printf("s: %s. || line: %s.n", s, line);  
 }  
 return 0;  
 }  

Test file:

start
a1
a2
a3 #other comment
end

start
b1
b2
start
c1#comment….
c2
end
end
[/sourcecode]

typedef & C structs

Recently I’ve been working mostly with C code and one thing I noticed over and over was that most of the structs were declared with a typedef..

For example:

  
 typedef struct {  
 int a;  
 int b;  
 } my_struct_t;  

instead of:

  
 struct my_struct {  
 int a;  
 int b;  
 };  

The main difference between the two declarations is that the one with the typedef creates a new type called mystructt and the latter creates a tag called my_struct, not a type.

So this code would be valid:

  
 struct my_struct my_struct;  

it is creating a variable with the name my_struct that has a type of
struct my_struct. The compiler treats tags and types differently.

Using typedef when declaring structs besides saving some keystrokes makes the code easier to read since you don’t need to explicit say the keyword struct everytime you want to refer to your struct.

So instead of coding:

  
 struct my_struct some_function(int a, struct my_struct);  

you can reference struct my_struct by its new type:

  
 my_struct_t some_function(int a, my_struct_t);  

Another important thing to mention, is if you need to reference the struct you are declaring as one of its own members, for example in a linked list.

  
 typedef struct S1 {  
 int a;  
 int b;  
 struct S1 *s;  
 struct S1 s; // error: field ‘s’ has incomplete type  
 S1_t *s; // error: ‘S1_t’ does not name a type  
 } S1_t;  

The error “field ‘s’ has incomplete type” happens because one of the members of the struct is the struct itself, so the compiler looks up for the struct S1 type but it can’t find, since it has not been declared yet.

The same thing happens if you try to reference the name giving in the typedef, in this case S1_t, S1_t represents a struct S1. However during the declaration of struct S1, S1_t doesn’t exist yet.

The solution is to create a pointer to a struct S1.
A pointer points to a memory address, it doesn’t matter the type of the data the pointer is pointing to, the memory address will always have the same size, so the compiler knows how to interpreter during compilation time.

Now at runtime you can allocate memory for a struct s1 and assign to the s pointer.
The only thing the compiler will check is if the memory block represents a struct s1, since it knows what a struct s1 looks like.

  
 #include   
 #include 

typedef struct S1 {  
 int a;  
 int b;  
 struct S1 *s;  
 // struct S1 s; // error: field ‘s’ has incomplete type  
 // S1_t *s; // // error: ‘S1_t’ does not name a type  
 } S1_t;

int main (void) {  
 S1_t s1;  
 S1_t s2;

 s1.a = 1;  
 s1.b = 2;

 s2.a = 3;  
 s2.b = 4;

 //s1.s = &s2;  
 s1.s = (S1_t*) malloc(sizeof(S1_t));  
 s1.s->a = 3;  
 s1.s->b = 4;

 printf("s1.a: %dn", s1.a);  
 printf("s1.b: %dn", s1.b);  
 printf("s1.s->a: %xn", s1.s->a);  
 printf("s1.s->b: %xn", s1.s->b);

 return 0;  
 }  

You can find a more detail explanation here.