
Is there a library routine like fgets() but which takes any of CR, NL, and CR+NL as an end of line? -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Is there a library routine like fgets() but which takes any of CR, NL, and CR+NL as an end of line?
I'm pretty sure there isn't. Is using sscanf out of the question? You may be able to construct a format string that does what you want unless you really do want a CR by itself to also represent a linefeed (which I can't think how to do using sscanf). James

On Thu, 31 May 2012, James Harper <james.harper@bendigoit.com.au> wrote:
Is there a library routine like fgets() but which takes any of CR, NL, and CR+NL as an end of line?
I'm pretty sure there isn't. Is using sscanf out of the question? You may be able to construct a format string that does what you want unless you really do want a CR by itself to also represent a linefeed (which I can't think how to do using sscanf).
I really need to support either. I'm dealing with the output of a Windows program that seems to convert between DOS and Unix file formats. So Peter's suggestion of getdelim() isn't going to help me this time, although it's a handy thing to note for future reference. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Thu, 31 May 2012, James Harper <james.harper@bendigoit.com.au> wrote:
Is there a library routine like fgets() but which takes any of CR, NL, and CR+NL as an end of line?
I'm pretty sure there isn't. Is using sscanf out of the question? You may be able to construct a format string that does what you want unless you really do want a CR by itself to also represent a linefeed (which I can't think how to do using sscanf).
I really need to support either.
I'm dealing with the output of a Windows program that seems to convert between DOS and Unix file formats.
Is each file self-consistent in what it uses though? Or could it switch between CR, LF, and CR+LF as EOLs? If so, is CR+LF+LF+CR+LF 3, 4, or 5 EOLs? James

On Thu, 31 May 2012, James Harper <james.harper@bendigoit.com.au> wrote:
Is each file self-consistent in what it uses though? Or could it switch between CR, LF, and CR+LF as EOLs? If so, is CR+LF+LF+CR+LF 3, 4, or 5 EOLs?
It changes in the middle of the file. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Russell Coker wrote:
On Thu, 31 May 2012, James Harper <james.harper@bendigoit.com.au> wrote:
Is there a library routine like fgets() but which takes any of CR, NL, and CR+NL as an end of line?
I'm pretty sure there isn't. Is using sscanf out of the question? You may be able to construct a format string that does what you want unless you really do want a CR by itself to also represent a linefeed (which I can't think how to do using sscanf).
I really need to support either.
I'm dealing with the output of a Windows program that seems to convert between DOS and Unix file formats.
Put something like tr or sed s/\r$// in front of your real program, and let it deal with unfucking CRLF files.

James Harper <james.harper@bendigoit.com.au> wrote:
Is there a library routine like fgets() but which takes any of CR, NL, and CR+NL as an end of line?
I'm pretty sure there isn't. Is using sscanf out of the question? You may be able to construct a format string that does what you want unless you really do want a CR by itself to also represent a linefeed (which I can't think how to do using sscanf).
You could write a function to do this without much trouble though. CR alone is the real problem here: if you were only concerned with LF or CR+LF then all you would have to do is deal with the case in which there's an extraneous CR at the end of the buffer (you could simply change it to '\0', I suppose, and ensure that the buffer is long enough for the expected lines, including trailing CR).

On Thu, 31 May 2012, Jason White <jason@jasonjgw.net> wrote:
You could write a function to do this without much trouble though.
Below is my rough analog to fgets() which takes either \r or \n as the end of a line. It works because I don't mind getting the occasional empty line in the middle of the file, programs which can't handle extra empty lines won't work with this. int get_line(char *buf, int buf_len, FILE *fp) { int c, i = 0; while((c = fgetc(fp)) != EOF) { if(c == '\r' || c == '\n') { buf[i] = 0; return 1; } buf[i] = (char)c; i++; if(i == buf_len - 1) { buf[i] = 0; return 1; } } if(i > 0) { buf[i] = 0; return 1; } return 0; } -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Russell Coker <russell@coker.com.au> wrote:
Below is my rough analog to fgets() which takes either \r or \n as the end of a line. It works because I don't mind getting the occasional empty line in the middle of the file, programs which can't handle extra empty lines won't work with this.
It seems correct to me. Of course, being written by Russell, one can expect it to be correct.

On Thu, 31 May 2012, Jason White <jason@jasonjgw.net> wrote:
It seems correct to me. Of course, being written by Russell, one can expect it to be correct.
I've written my share of crap code in the past, not much that would be worth of submission to TheDailyWTF though. On Thu, 31 May 2012, Mark Trickett <marktrickett@bigpond.com> wrote:
There is a program that converts the different newline formats between Unix, Mac and DOS. It might be worth seeing how that does as a preprocessor.
That possibility occurred to me, but I don't want to make it too slow. The program in question may end up dealing with a lot of data. On Thu, 31 May 2012, James Harper <james.harper@bendigoit.com.au> wrote:
If you pass buf_len=0 or buf_len=1 you would get yourself a buffer overflow. Maybe that doesn't really matter in your case if you use hardcoded inputs, but I'd still check for it to save from puzzling crashes if you re-use the code in future.
Thanks for the suggestion, I've made the code in question just return 0 if the buffer length is less than 2. It's ugly but it makes the consequences obvious to the caller and a human who reads the source. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Thu, 31 May 2012, Jason White <jason@jasonjgw.net> wrote:
You could write a function to do this without much trouble though.
Below is my rough analog to fgets() which takes either \r or \n as the end of a line. It works because I don't mind getting the occasional empty line in the middle of the file, programs which can't handle extra empty lines won't work with this.
int get_line(char *buf, int buf_len, FILE *fp) { int c, i = 0; while((c = fgetc(fp)) != EOF) { if(c == '\r' || c == '\n') { buf[i] = 0; return 1; } buf[i] = (char)c; i++; if(i == buf_len - 1) { buf[i] = 0; return 1; }
If you pass buf_len=0 or buf_len=1 you would get yourself a buffer overflow. Maybe that doesn't really matter in your case if you use hardcoded inputs, but I'd still check for it to save from puzzling crashes if you re-use the code in future. James

Hello Russell, On Thu, 2012-05-31 at 13:06 +1000, Russell Coker wrote:
Is there a library routine like fgets() but which takes any of CR, NL, and CR+NL as an end of line?
There is a program that converts the different newline formats between Unix, Mac and DOS. It might be worth seeing how that does as a preprocessor. Regards, Mark Trickett
participants (6)
-
James Harper
-
Jason White
-
Mark Trickett
-
Russell Coker
-
Toby Corkindale
-
Trent W. Buck