fgets() and DOS files

Russell Coker

31 May 2012 31 May '12

3:06 a.m.

Is there a library routine like fgets() but which takes any of CR, NL, and CR+NL as an end of line? -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Show replies by date

James Harper

31 May 31 May

4:03 a.m.

...

Is there a library routine like fgets() but which takes any of CR, NL, and CR+NL as an end of line?

I'm pretty sure there isn't. Is using sscanf out of the question? You may be able to construct a format string that does what you want unless you really do want a CR by itself to also represent a linefeed (which I can't think how to do using sscanf). James

Russell Coker

4:15 a.m.

On Thu, 31 May 2012, James Harper <james.harper@bendigoit.com.au> wrote:

...

...
Is there a library routine like fgets() but which takes any of CR, NL, and CR+NL as an end of line?

I'm pretty sure there isn't. Is using sscanf out of the question? You may be able to construct a format string that does what you want unless you really do want a CR by itself to also represent a linefeed (which I can't think how to do using sscanf).

I really need to support either. I'm dealing with the output of a Windows program that seems to convert between DOS and Unix file formats. So Peter's suggestion of getdelim() isn't going to help me this time, although it's a handy thing to note for future reference. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

James Harper

4:17 a.m.

...

On Thu, 31 May 2012, James Harper <james.harper@bendigoit.com.au> wrote:

...
...
Is there a library routine like fgets() but which takes any of CR, NL, and CR+NL as an end of line?

I'm pretty sure there isn't. Is using sscanf out of the question? You may be able to construct a format string that does what you want unless you really do want a CR by itself to also represent a linefeed (which I can't think how to do using sscanf).

I really need to support either.

I'm dealing with the output of a Windows program that seems to convert between DOS and Unix file formats.

Is each file self-consistent in what it uses though? Or could it switch between CR, LF, and CR+LF as EOLs? If so, is CR+LF+LF+CR+LF 3, 4, or 5 EOLs? James

Russell Coker

4:21 a.m.

On Thu, 31 May 2012, James Harper <james.harper@bendigoit.com.au> wrote:

...

Is each file self-consistent in what it uses though? Or could it switch between CR, LF, and CR+LF as EOLs? If so, is CR+LF+LF+CR+LF 3, 4, or 5 EOLs?

It changes in the middle of the file. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Trent W. Buck

9:01 a.m.

Russell Coker wrote:

...

On Thu, 31 May 2012, James Harper <james.harper@bendigoit.com.au> wrote:

...
...
Is there a library routine like fgets() but which takes any of CR, NL, and CR+NL as an end of line?

I'm pretty sure there isn't. Is using sscanf out of the question? You may be able to construct a format string that does what you want unless you really do want a CR by itself to also represent a linefeed (which I can't think how to do using sscanf).

I really need to support either.

I'm dealing with the output of a Windows program that seems to convert between DOS and Unix file formats.

Put something like tr or sed s/\r$// in front of your real program, and let it deal with unfucking CRLF files.

Jason White

4:37 a.m.

James Harper <james.harper@bendigoit.com.au> wrote:

...

...
Is there a library routine like fgets() but which takes any of CR, NL, and CR+NL as an end of line?

I'm pretty sure there isn't. Is using sscanf out of the question? You may be able to construct a format string that does what you want unless you really do want a CR by itself to also represent a linefeed (which I can't think how to do using sscanf).

You could write a function to do this without much trouble though. CR alone is the real problem here: if you were only concerned with LF or CR+LF then all you would have to do is deal with the case in which there's an extraneous CR at the end of the buffer (you could simply change it to '\0', I suppose, and ensure that the buffer is long enough for the expected lines, including trailing CR).

Russell Coker

9:05 a.m.

On Thu, 31 May 2012, Jason White <jason@jasonjgw.net> wrote:

...

You could write a function to do this without much trouble though.

Below is my rough analog to fgets() which takes either \r or \n as the end of a line. It works because I don't mind getting the occasional empty line in the middle of the file, programs which can't handle extra empty lines won't work with this. int get_line(char *buf, int buf_len, FILE *fp) { int c, i = 0; while((c = fgetc(fp)) != EOF) { if(c == '\r' || c == '\n') { buf[i] = 0; return 1; } buf[i] = (char)c; i++; if(i == buf_len - 1) { buf[i] = 0; return 1; } } if(i > 0) { buf[i] = 0; return 1; } return 0; } -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Jason White

9:21 a.m.

Russell Coker <russell@coker.com.au> wrote:

...

Below is my rough analog to fgets() which takes either \r or \n as the end of a line. It works because I don't mind getting the occasional empty line in the middle of the file, programs which can't handle extra empty lines won't work with this.

It seems correct to me. Of course, being written by Russell, one can expect it to be correct.

Russell Coker

12:22 p.m.

On Thu, 31 May 2012, Jason White <jason@jasonjgw.net> wrote:

...

It seems correct to me. Of course, being written by Russell, one can expect it to be correct.

I've written my share of crap code in the past, not much that would be worth of submission to TheDailyWTF though. On Thu, 31 May 2012, Mark Trickett <marktrickett@bigpond.com> wrote:

...

There is a program that converts the different newline formats between Unix, Mac and DOS. It might be worth seeing how that does as a preprocessor.

That possibility occurred to me, but I don't want to make it too slow. The program in question may end up dealing with a lot of data. On Thu, 31 May 2012, James Harper <james.harper@bendigoit.com.au> wrote:

...

If you pass buf_len=0 or buf_len=1 you would get yourself a buffer overflow. Maybe that doesn't really matter in your case if you use hardcoded inputs, but I'd still check for it to save from puzzling crashes if you re-use the code in future.

Thanks for the suggestion, I've made the code in question just return 0 if the buffer length is less than 2. It's ugly but it makes the consequences obvious to the caller and a human who reads the source. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

James Harper

11:56 a.m.

...

On Thu, 31 May 2012, Jason White <jason@jasonjgw.net> wrote:

...
You could write a function to do this without much trouble though.

Below is my rough analog to fgets() which takes either \r or \n as the end of a line. It works because I don't mind getting the occasional empty line in the middle of the file, programs which can't handle extra empty lines won't work with this.

int get_line(char *buf, int buf_len, FILE *fp) { int c, i = 0; while((c = fgetc(fp)) != EOF) { if(c == '\r' || c == '\n') { buf[i] = 0; return 1; } buf[i] = (char)c; i++; if(i == buf_len - 1) { buf[i] = 0; return 1; }

If you pass buf_len=0 or buf_len=1 you would get yourself a buffer overflow. Maybe that doesn't really matter in your case if you use hardcoded inputs, but I'd still check for it to save from puzzling crashes if you re-use the code in future. James

Toby Corkindale

4:51 a.m.

On 31/05/12 13:06, Russell Coker wrote:

...

Is there a library routine like fgets() but which takes any of CR, NL, and CR+NL as an end of line?

Perl? It's almost a library routine for C..

Mark Trickett

11:38 a.m.

Hello Russell, On Thu, 2012-05-31 at 13:06 +1000, Russell Coker wrote:

...

Is there a library routine like fgets() but which takes any of CR, NL, and CR+NL as an end of line?

There is a program that converts the different newline formats between Unix, Mac and DOS. It might be worth seeing how that does as a preprocessor. Regards, Mark Trickett

4779

Age (days ago)

4779

Last active (days ago)

List overview

Download

12 comments

6 participants

participants (6)

James Harper
Jason White
Mark Trickett
Russell Coker
Toby Corkindale
Trent W. Buck