Problem opening a CSV with multiple zeros

Problem description

I have a .csv file with the following content (see here Test.csv (152 Bytes)):


When I try to open it by using

dlmread ('Test.csv', ';')

I get this wrong results (with many zeros, like it is not able to read correctly the file content):

   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

How to solve it?

My system

  • OS: e.g. Windows 10
  • Octave version: e.g. Version 6.3.0
  • Installation method: e.g. Downloaded and installed “octave-6.3.0-w64-installer.exe” from Download

That file seems to be encoded in UTF-16 (which is quite unusual for a plain-text file).
Afaict, the dlmread function doesn’t use any heuristics to determine the encoding of the read file (and imho it shouldn’t even try).
That file is read correctly for me if I convert it to UTF-8 before reading with dlmread.

Edit: It looks like there is a stray double quote " in the first and the last column. Is that an error of the program that wrote the file?

How can you read both text letters and numbers? I have some problems with reading such file (even if I convert into UTF-8).

Thank you

From dlmread's docstring:

Read numeric data from the text file file which uses the delimiter sep between data values.

That function is for reading numeric data from text files.

If you’d like to read both text and numeric values and you know the structure of the file beforehand, you could use e.g. textscan instead.
With the file converted to UTF-8, you could use e.g.:

fid = fopen('Test-discourse-1774-utf-8.csv');
C = textscan(fid, '%s %d %s %s %s %s %d %d %d %f %f %d %s %d %s %d %s %s', 'Delimiter', ';')

I don’t know which types you’d expect in the empty columns. So you might need to adapt the format string for those.
Also consider using %* (followed by the correct type specifier) if you’d like to skip some columns.

1 Like

Thank you for the support. Does exist a method to converty automatically from UTF-16 to UTF-8 in Octave or do I need to do it by hands (e.g. in Excel)?

It’s probably more convenient to use external tools to convert the encoding (e.g. any decent text editor). But it’s also possible to use Octave for that task.
You could use commands similar to these:

fid16 = fopen ('Test-discourse-1774-utf-16.csv');
csv16 = fread (fid16);  % read file content (binary)
fclose (fid16);
csv8 = native2unicode (csv16(:).', 'UTF-16LE');  % convert from UTF-16LE to UTF-8
fid8 = fopen ('Test-discourse-1774-utf-8.csv', 'w');
fwrite (fid8, csv8);  % write converted string to new file
fclose (fid8);

Edit: At that point you could probably skip saving the UTF-8 string to a file. Instead, you could use textscan directly with csv8.

1 Like