Reading metadata from comment lines at start of data file

I have data files that look like:

# Date/Time 2021/12/22, 10:52:53
# PonE=0,LsKp=200,LsKi=0,LsKd=250,HsKp=40,HsKi=0,HsKd=130,Sp=800,TDEC=1175137
# Rel. Time, currentPos, PosPID, currentSpeed, speedPID, Lag, ServoPos

I want to import this data and plot it. I want to use the date and time (in this file 2021/12/22, 10:52:53)
as the plot title, and to use the data from the second comment line as a caption (or similar)

In addition I want to plot a horizontal line with the value of TDEC from that same line.

The data after the headers I can read using dlmread I think. The problem is how to handle that comment data. I suspect that textscan might be needed, but as I have never used Octave before, I am struggling to come up with the necessary code.

If it makes things simpler I have some control over the format of the comments and whether (e.g.) the 4th line should be a comment or not.


You could probably just open the file and read the first two lines as strings:

fid = fopen('test.dat'); # or whatever the file name is...
line1 = fgetl(fid);
line2 = fgetl(fid);

Does that help?

Edit: Missed the part about TDEC:

TDEC_str = regexp(line2, '.*TDEC=([0-9]*)', 'tokens');
TDEC = str2double(TDEC_str{1}{1});

Another way, not the most efficient, but rather intended to show a little of textscan usage. You can look up the invoked parameters from textscan’s help.
Note that textscan is a beast, it has quite an overload of bells and whistles so it’s easy to loose track of what one is doing. But it can be very flexible and very fast at the same time.

## Open file (to be sure just for reading)
fid = fopen ("data.dat", "r");

## Get date & time from first line
C = textscan (fid, "# Date/Time %s %s", 1);
dattim = strjoin (cell2mat (C), " ");

## Get TDEC from second line
line2 = fgetl (fid);
TDEC = textscan (line2, "# %*s %*s %*s %*s %*s %*s %*s %*s TDEC=%f", 1, ...
                 "delimiter", ",", "whitespace", "");
TDEC = cell2mat (TDEC);

## Get next comment lines
fgetl (fid);
fgetl (fid);    # Optional, only needed if 4th comnet line is present

## Get data
C = textscan (fid, "%f %f %f %f %f %f %f", "delimiter", ",", "CollectOutput", 1);
C = cell2mat (C);

fclose (fid);

(watch out for line wrap)
The tricks used here are that when reading from a file, for each invocation textscan just starts at the current file pointer position, and you can specify the number of times the format string is to be invoked (which is different than the number of lines to be read - one format string can describe several lines, of just a part of a line). Also, unwanted parts of text can be skipped by using them as ‘literals’ in the format string. Some basic regexp-like functionality is supported using %[ ] -like format specifiers.
textscan always returns a cell array, I often apply cell2mat right away as in “cell2mat (textscan (…))” but in the example it’s a bit more separated.


## Get TDEC from second line
line2 = fgetl (fid)
TDEC = textscan (line2, "# %*s %*s %*s %*s %*s %*s %*s %*s TDEC=%f", 1, ...
  "Delimiter", ",", "Whitespace", "")
TDEC = cell2mat (TDEC)
fclose (fid);

results in:

line2 = # PonE=0,LsKp=200,LsKi=0,LsKd=250,HsKp=40,HsKi=0,HsKd=130,Sp=800,TDEC=1175137
  [1,1] = NaN


Please could someone tell me why I’m getting NaN when trying to parse the line using textscan? I want to get the value of TDEC.

Thanks, David

Works for me. What is your octave version?


Why don’t you use the imho simpler regexp from above?

He is trying to learn, always good - but sometimes frustrating.

Apologies, I didn’t visit here for several days. I didn’t get a reply notification, don’t know if Discourse can do that.

I only supplied the textscan usage as a mere example because you more or less asked for it. Good on you that you tried, and I (and I guess all of us here) know the feeling when things that apparently should work just don’t :slight_smile:

Given your problem I wonder what your file really looks like, esp. regarding encoding - that’s the only issue I can think of ATM.
What does
uint8 (line2)
here I get (Discourse’s formatting options permitting):

> >> uint8 (line2)
> ans =
>  Columns 1 through 17:
>    35   32   80  111  110   69   61   48   44   76  115   75  112   61   50   48   48
>  Columns 18 through 34:
>    44   76  115   75  105   61   48   44   76  115   75  100   61   50   53   48   44
>  Columns 35 through 51:
>    72  115   75  112   61   52   48   44   72  115   75  105   61   48   44   72  115
>  Columns 52 through 68:
>    75  100   61   49   51   48   44   83  112   61   56   48   48   44   84   68   69
>  Columns 69 through 77:
>    67   61   49   49   55   53   49   51   55

If you’re fed up with trying textscan for that TDEC value: well then, simply resort to the regexp that Markus suggested and read the rest (the data) with textscan (does that work?)

And the charm was:

## Get TDEC from second line
line2 = fgetl (fid)
TDEC = textscan (line2, "# %*s, %*s, %*s, %*s, %*s, %*s, %*s, %*s, TDEC=%u32", 1, ...
  "Delimiter", ",", "Whitespace", " ")
TDEC = cell2mat (TDEC)

the commas were necessary …


the commas were necessary …

That is remarkable :roll_eyes: Those commas are specified in the <“delimiter”, “,”> options in the textscan call; what you did is specify them as “literals” as well. Not bad, it should work equally well, but then I’d wonder if the “delimiter” arument would make any difference.
Again, I suspect file encoding at play here.

Anyway, it works for you, glad you got it solved :slight_smile:

excellent thread, thank you everybody. I wasn’t the one asking those questions, but I am working with similar text files. Always wondered if I can automate data series naming from the strings in the header.