Jupyter Notebook Integration

Hello maintainers,

In the last period, I was thinking about the suitable approach to follow to implement Jupyter Notebook Integration.

The possible targets for this project are:

  • Running Jupyter Notebook within GNU Octave: As Jupyter Notebooks are JSON dictionaries and decoding JSON is supported. All we need to do is to extract code from code cells and run it.

  • Filling Jupyter Notebook within GNU Octave: Manipulating the sections of the dictionaries that form Jupyter Notebooks can be done flawlessly, and we can encode those dictionaries back into the JSON format.

  • Transform a Jupyter Notebook to an Octave script: This can be done by transforming Markdown cells into comments (there will be some losses) and extracting the code from the code cells.

  • Transform an Octave script to a Jupyter Notebook: This part will have a problem in defining the cells in the notebook. For example, Do we make every line a cell? Every block of code with no blank lines? Blocks of code that are separated by comments? And so on. I think a better option is to enable the user to populate the dictionary with cells then encode this JSON dictionary into a notebook.

I think the best way to achieve those targets is to have a class that contains the dictionary that resembles the notebook and to provide methods to manipulate the dictionary, run code cells, fill the notebook, and so on.

What do you think of this approach? If you have any comments or concerns, please let me know.

If you think that this approach is reasonable, I will start working on the details of the interface that we are going to provide.

1 Like

Hello, having a class in between Octave and Jupyter Notebook seems sensible, especially as it would allow to store the global state.

For completeness:

One aspect limiting the use of Octave in Jupyter Notebooks is the formatting of the outputs, especially to have them interactive. Would you know where to find some documentation on how to write widgets?

I think that too. @siko1056, what do you think?

From my understanding, The project aims to run and fill Jupyter Notebooks within GNU Octave, not in the opposite direction (like pluto and Live code format of MATLAB).

Agree to the suggested Octave classdef approach. You are right @Abdallah_Elshamy , the idea is not to reinvent octave_kernel for Jupyter again. Using plain Jupyter Notebook or JupyterLab with octave_kernel, enables all interactivity the Jupyter “framework” provides in the browser. Except one thing, explained below, what this GSoC project is about.

A “deep” integration of Jupyter into Octave, such as Matlab’s Live Editor or Julias Pluto, is totally out of reach for a single reduced time GSoC project. Many developers at Jupyter, Julia, or Mathworks worked for years on each tool, respectively.

Jupyter Notebooks are the idea of not only sharing code, but more documenting why a particular code was chosen, including graphical output or widgets. An extension of the idea of the “publish” function, which is already supported by Octave and existed in Matlab before 2006.

The particular drawback is, that once I have documented a research project using Jupyter Notebooks, I want to “just run and fill” them, without being permanently “connected” by browser (keep notebook running after the browser tab closed · Issue #1647 · jupyter/notebook · GitHub).

There are ways to work with those notebooks non-interactively, e.g. https://towardsdatascience.com/keep-jupyter-notebook-running-even-after-browser-is-closed-9a1937b7c615

But I would find it way more convenient, if Jupyter Notebooks can be run and filled like running a script or using publish. For making changes to the Notebook itself, I still prefer using Jupyter.

Can we use something like GNU Screen or tmuz.

I started working on those details. Here is the class that I have in mind:

  • notebook: This private attribute will be populated by a constructor that takes the input notebook, reads it and encodes it (using jsonencode). If the input Jupyter notebook is correct, the struct will have 4 keys: metadata, nbformat, nbformat_minor, cells. All of the manipulation will happen in the cells field.

  • runCell(): If the cell is code cell it will run the code inside it and populate the outputs field inside the Structures in cells in the notebook Structure. The value of the outputs field is an array of Structures. These Structures have a field output_type this value depends on the result of the execution of the code inside the cell. Subsequently, they determine the other fields in the Structure. The outputs can only have 4 types:

  1. stream: This one only contains the text outputs from the standard streams (stdout or stderr). This one should be easy to be populated.

  2. display_data: This one is trickier as it contains rich display. The data is keyed by mime-type. We will need to know that is the possible mime-types produced by Octave and handle them. Some mime-types also have metadata (like images) and this needs to be handled as well

  3. execute_result: I think that this type shouldn’t be considered as its outputs are identical to display-data with only an execution_count field which shouldn’t matter as we are just filling the notebook. What do you think?

  4. error: If an error happens while executing it shows its name, value and traceback.

If the code cell contains more than one type of output (for example: stream and display_data) the code cell will be split into multiple code_cells.

If an error was detected in the middle of a code cell, the execution will be stopped and the following code won’t be executed.

  • runAllCells(): will run all code cells inside the cells array by looping over them and calling runCell.

  • generateNotebook(): this function will generate a notebook file (.ipynb file) from the notebook Structure by encoding it into JSON format and writing the output into a file.

  • generateOctaveScript(): will generate an Octave script that contains the content of the input notebook. This can be done by extracting the contents of the Markdown cells as comments and extracting the code from code cells, then writing them to a file.

  • AddCodeCell(): Append a code cell to cells array inside the notebook Structure. This method takes as an input the code and the metadata.

  • AddMarkdownCell(): Append a markdown cell to cells array inside the notebook Structure. This method takes as an input the markdown and the metadata.

Will this be provided as a part of Octave’s core or as an external package?

1 Like

Sounds good and is in accordance with nbformat (4.0.0 to 5.1.3) :white_check_mark:

This is indeed the crucial part of the project deciding if users finding this Octave feature useful or useless. For the display_data the most likely mime-type (I use) will be “image/png”, that is binary64 encoded images. Having images embedded into the notebook documents is for me one of the most useful features of using Jupyter Notebooks, thus I like to see this supported.

About a general structure how to catch the (text, graphic) output of Octave code executions, you can take a look at the publish > eval_code function, I created a few years ago.

Above looks all good to me. The save Octave script format can look similar to a publish-able document, except for the Comment markup, which should remain Markdown.

These features I would give a very low priority for the start. I think users find it useful, being able to modify the Notebook from Octave itself, but without a GUI like Jupyter, just from entering code into a console, at least I cannot image to use very frequently :thinking: The comfort and overview of my actions WYSIWYG is hardly given. Let’s think about this at a later stage of the project.

Also how about modifying existing cells? :sweat:

In the end it should become part of core Octave. However, recently I am bigger fan of the idea of developing new features as packages. They are very lightweight, you easily can share them and keep the overview of your actions.

Recompiling a whole Octave for developing a single function(ality) can be kind of an overkill. This was at least my impression of last year’s JSON GSoC project. A further aspect is you are not bound to work on the sometimes unstable dev branch, don’t have to rebase your patches (over years, if the maintainers don’t catch up with importing your patch early enough), and you can make use of your code even for older Octave versions.

For the same reasons, for the JSON package, I just went the other way round and extracted your core contributions to a package :sweat_smile:

1 Like

This is very helpful. I think that the classdef should have a similar private function to modularize the code and shorten the runCell function.

Now that I think about modifying cells, I agree that it is not very practical. The project should focus on running and filling notebooks only.

I think that starting as a package and then at the end of GSoC integrating this package into core Octave (while keeping the package) is the way to go for the reasons you mentioned. Also Extracting code from the package and integrating it into core Octave is easier than the other way around.

After those modifications, the structure of the classdef will be:

I am working on the proposal right now and will post it to get reviews as soon as I finish it. Regarding the milestones, I suggest having those two milestones:

  1. Finish the structure of the package, the constructor, runAllCells method, generateNoteBook method, generateOctaveScript method, and evalCode private method.
  2. Finish the runCell method.

I chose this order as my final exams will be before the first milestone, and having the tricky part after I finish the exams will allow me to focus more on it. What do you think?

1 Like

I finished the first draft here is the link. I also shared it from the GSoC website.

If you have any feedback, please don’t hesitate to tell me.

1 Like

Hello, this is Suhas Patlolla. I am working with Octave for last few months for running Machine Learning Algorithms. So, I have seen the Suggested Projects in the Website. So, I am really want to work with the Project - Jupyter Notebook Integration. I have Some idea in implementing this and My approach goes like this,

–>Write a code in octave
–>Convert to jupyter executable file
–>Then upload the file to jupyter server using API Instead of opening it in web browser
–>Finally, Getting the output in octave CLI
So,This also makes the Octave users to easily use this instead downloading all packages.
So, This is my approach sir and I am a Second year student and not much known about all these.
What do you think of this approach? If you have any comments or concerns, please let me know.

Thank you for your proposal. The quoted block I do not fully understand.

  • What is a “jupyter executable file”, how is that different from a “*ipynb” file?
  • Which “jupyter server” (url)?
  • Why should the notebook be uploaded to a server without filling the output cells beforehand?
  • What do you mean with “downloading all packages” and how should Octave users “easily” use whatever you are going to produce?

My suspicion is, you want to bundle your Jupyter-Notebook with a full Octave distribution + necessary packages (like https://mybinder.org/)? This might result in huge files (+1 GB Docker images).

I updated the document with how are we going to handle existing outputs cells

If the code cell has an existing “outputs” field, the contents of this field will be updated with the new outputs returned from the executed code. As those results may be outdated, it is better to run the code and repopulate the “outputs” field than to depend on them being correct.

If you have a better way to handle this or any other suggestions, please let me know.

I think stripping any previous outputs, before adding new ones is a good approach and how Jupyter works :+1:

1 Like

Great, That was the last note from the ones you left (thanks for taking the time to review it) in the proposal Doc.

Do you have any other notes before I submit the proposal?

One minor remark, rest looks good to me :+1: Looking forward for your application :slightly_smiling_face:

1 Like

I came across nbterm today:

In the “Embedding nbterm” section, it uses methods cut_cell, paste_cell, run_all, save: perhaps the same naming could be used for this project?

I don’t mind this at all. Following the same naming of another similar project will be nice.