Help on textscan, rare occurrences of errors

My system

  • OS: Windows 10 / Windows 7
  • Octave version: Version 6.4.0
  • Installation method: octave-6.4.0-w64-installer.exe

Hi,

I have been having some very rare occurrences (about 0.15% based on 15k iterations) where the following code does not work.
The files are all generated by a single program and are supposed to strictly follow the same format.
I can see that the header changes but I don’t see why it should affect the rest of the file.
The equivalent function in Matlab never fails.
I can’t figure out why but the last columns are not recognized properly for some reasons.
Any help would be appreciated

##Path2filename   = 'Spectrum_ABEC_KO.txt'; % not working, 0.15% of files
##Path2filename   = 'Spectrum_ABEC_OK.txt'; % working
##should yield SPL size 40*37

delimiter  = ' ';
startRow   = 39;
endRow     = 78;
formatSpec = '%f%f%f%[^\n]';

fileID     = fopen(Path2filename,'r');
dataArrayZ = textscan(fileID, formatSpec, endRow-startRow+1, 'Delimiter', delimiter, 'MultipleDelimsAsOne', true, 'HeaderLines', startRow-1, 'ReturnOnError', false, 'EndOfLine', '\n');
fclose(fileID);
## Always works
freqZ      = cell2mat(dataArrayZ(1));
ZRe        = cell2mat(dataArrayZ(2));
ZIm        = cell2mat(dataArrayZ(3));
RadZ       = [ZRe ZIm]; % always ok

% Clear temporary variables
 clearvars delimiter startRow endRow formatSpec dataArrayZ ans;

delimiter  = ' ';
startRow   = 105;
endRow     = 144;

##formatSpec = '%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%[^\n\r]';
##fileID     = fopen(Path2filename,'r');
##dataArray  = textscan(fileID, formatSpec, endRow-startRow+1, 'Delimiter', delimiter, 'MultipleDelimsAsOne', true, 'HeaderLines', startRow-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
##fclose(fileID);
##size(dataArray)
##Freq           = cell2mat(dataArray(1)); 
##SpectrumABEC   = cell2mat(dataArray(2:end-1)); % drop the last col empty cells
##size(dataArray)% no error but only every other line

##Matlab version always works
##formatSpec = '%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%[^\n\r]';
##fileID     = fopen(Path2filename,'r');
##dataArray  = textscan(fileID, formatSpec, endRow-startRow+1, 'Delimiter', delimiter, 'MultipleDelimsAsOne', true, 'TextType', 'string', 'HeaderLines', startRow-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');

formatSpec = '%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%[^\n]';
fileID     = fopen(Path2filename,'r');
dataArray  = textscan(fileID, formatSpec, endRow-startRow+1, 'Delimiter', delimiter, 'MultipleDelimsAsOne', true, 'HeaderLines', startRow-1, 'ReturnOnError', false, 'EndOfLine', '\n');
fclose(fileID);
size(dataArray)

Freq           = cell2mat(dataArray(1));
SpectrumABEC   = cell2mat(dataArray(2:end-1)); % drop the last col empty

[~, col]       = size(SpectrumABEC);
NbSpectra      = col/2;
Pref           = 2e-5;

for k = 1:NbSpectra
SPL(:,k)       = 20*log10( abs( ( ( SpectrumABEC(:,2*k-1) + 1i*SpectrumABEC(:,2*k) ) /sqrt(2) ) / Pref ) ); % RMS value
deg(k,1)       = (k-1)*5;
end
size(SPL)

figure()
subplot (211)
     semilogx(freqZ, RadZ(:,1), 'k') 
     hold on 
     semilogx(freqZ, RadZ(:,2), 'k--') 
     grid on 
     ylabel('Z')
     xlabel('Frequency [Hz]')
     legend('ReZ', 'ImZ')
     title('Waveguide Z'); 
subplot (212)
     semilogx(Freq, SPL)  
     grid on
     ylabel('SPL [dB]')
     xlabel('Frequency [Hz]')
     legend('SPL 5 deg step')
     title('Waveguide SPL');

Spectrum_ABEC_OK.txt (66.3 KB)
Spectrum_ABEC_KO.txt (66.5 KB)

I haven’t tested actually. But the files you attached have CRLF line endings.
Looking at the commented code that you seem to be using in Matlab and comparing it to your Octave commands, it seems you are telling Matlab that the file has CRLF line endings. But you are telling Octave that the file would have LF line endings.
Does it help if you also tell Octave that the line ending is CRLF? Or convert the input files to LF line ending before parsing them in Octave?

1 Like

There are two versions of the Octave code the one with \r\n only returns every other line but I might something wrong. It would not explain why the vast majority of the files are working just fine

How would you perform the conversion?

For a quick check, you could open the file in a decent text editor and convert the line endings with it.

Again: I haven’t tried.
But something like this doesn’t do the trick?

formatSpec = '%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f';
fileID     = fopen(Path2filename,'r');
dataArray  = textscan(fileID, formatSpec, endRow-startRow+1, 'Delimiter', delimiter, 'MultipleDelimsAsOne', true, 'HeaderLines', startRow-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
fclose(fileID);
1 Like

It seems to help but I’ll need to run the optimizer over the WE (~50h) to have a large enough population to see if it solves the issue…

EDIT Iforgot to mention I had to change
SpectrumABEC = cell2mat(dataArray(2:end-1));
to
SpectrumABEC = cell2mat(dataArray(2:end));

I am afraid this did not solve the issue, or maybe even slightly worsen it.
I still have a small number of files that do not load properly.
The Matlab version does not have these issues.
By Changing ‘ReturnOnError’, false to ‘true’ I can see that the files partially load
Some additional KO files are attached
Spectrum_ABEC_3.txt stops line 27
Spectrum_ABEC_4.txt stops line 5
Spectrum_ABEC_3.txt (66.3 KB)
Spectrum_ABEC_4.txt (66.3 KB)

It didn’t read file Spectrum_ABEC_3.txt with the original commands either. So I wouldn’t say it got worse.
The last number that it read from that file is 7.456446333632951. That is most probably wrong. The string appearing at that position in the file is 7.456446333632951E-05. So, it stopped reading before the exponent (and probably stumbled upon the letter E when trying to parse the next number starting from that position).
If I manually change that capital letter E to lower case e, the file reads correctly.
That might help as a work-around for you. But if this file is read correctly with Matlab, there is at least a compatibility concern.

Could you please open a bug report about this on savannah so someone can look into it in more detail (and hopefully can come up with a fix)?
GNU Octave - Bugs: Browse Items [Savannah]

I’ll file the bug report, no problem.

It didn’t read file Spectrum_ABEC_3.txt with the original commands either. So I wouldn’t say it got worse.

Fair enough, the only metric I have is the number of failures which was a bit higher with the trial formatSpec. If it is only related to the E vs. e then it’s pure luck…

In the mean time I had this idea to make a quick test but I can’t get the fprint command to print anything but the first line of content that does seem to be the whole txt file content…

dataE   = importdata('Spectrum_ABEC_3.txt');
datae   = strrep(dataE, 'E', 'e');

datastr = {datae};
content = char(datastr);
fid     = fopen('Spectrum_ABEC_3e.txt', 'w');
fprintf(fid, content);
fclose(fid);

Edit:
I just realized that the OK file is also full of E

If you’d like to use Octave for the replacement of E in numbers to e, you could probably use something like this:

fid = fopen ('Spectrum_ABEC_3.txt', 'rb');
data = fread (fid);
data = regexprep (char (data.'), '([0-9])E', '$1e');
fclose (fid);

fid = fopen ('Spectrum_ABEC_3e.txt', 'wb');
fwrite (fid, data);
fclose (fid);

Changing E to e does not seem to change anything the files still can’t be read correctly, are you sure it worked at end?

Also the OK files all also have Es
Spectrum_ABEC_4e.txt (66.3 KB)
Spectrum_ABEC_3e.txt (66.3 KB)

Oops. I must have accidentally deleted the last digit in that number when I replaced “E” with “e”. When I delete the last digit in that number, it doesn’t matter whether “E” or “e” is used. It works in both cases.
So the work-around won’t work unfortunately.
Maybe, an issue when the “e” or “E” appears at some internal buffer border? But that is pure speculation and should probably be tracked down properly in a bug report…

bug is here:

1 Like

Hi,

Do you know if 7.1 includes the fix?

See:
bug #62152, Textscan fails in rare occurences

This bug still exists on the stable branch. But all of this buffering seems to be quite delicate, and I’d prefer to not make any intrusive changes in that area just before a planned release.

The stable branch corresponds to the released versions of Octave (currently Octave 7.x). The default branch contains changes that will likely be part of the next major release of Octave (probably Octave 8.x early next year).

Fixing this required significant changes to how buffering works when using textscan. With sufficient testing, we could consider grafting that change from the default branch to the stable branch (so it could be part of Octave 7.2.0). However, I don’t think I personally will be able to do sufficient testing.

IIUC, you are running textscan on a large amount of files. Are you able to compile the default branch of Octave? Could you check if all of your files are read correctly with that version? Are there any regressions with that version?

1 Like

I did not have much hope, I was just checking…

Unfortunately, I don’t have the environment/knowledge to compile Octave myself.

I am happy to test though and/or provide files to test.

Thanks anyway.

I played around with GitHub artifacts a bit to try and get kind of a “nightly build” or “CD build” for the default branch of Octave. I added some comments on this bug report on how that could be used on a Windows PC:
bug #61461, Command Window hide part of the line

It’s still not as easy as running an installer. But it might be easier (and faster) than compiling Octave oneself…