Skip to content

Conversation

coatless
Copy link
Contributor

@coatless coatless commented Feb 25, 2024

So, this adds a devcontainer image that has a few choice C++ extensions enabled (but not configured) for VS Code.

You can check it out on my forked repo here:

Open in GitHub Codespaces

The video next shows creating the codespace from scratch and running cmake. It stops just before the build install step as I didn't see jenkins password. (For that reason, maybe we should opt for a different base image?) In total, the runtime is ~2.5 minutes.

launch-mlpack-codespace.mp4

Close #3632

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coatless this is a nice addition! On our side I think we can clean up the base Docker images and I can make changes there and push to Dockerhub as needed, but I have a handful of questions.

Note that I have never used Codespaces, hadn't heard of devcontainer.json before about two days ago, and otherwise have little idea what is going on. So if my comments are way off base or miss the point, please tell me so 😄

@@ -0,0 +1,9 @@
# Default image contents: https://hub.docker.com/r/mlpack/jenkins-amd64-debian
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the mlpack/mlpack image would be the better one to use?
https://hub.docker.com/r/mlpack/mlpack

I guess one question is, is the goal to have a container with mlpack already built and available, so that the user can develop applications that use mlpack? Or is the goal to have a container that has mlpack's dependencies, so that a user can develop mlpack itself and submit PRs to the project?

In either case, it's trivial to add a new Dockerfile to here that we can just use directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is for a container that could be used for immediate development onmlpack itself with a low storage cost. That is, we could say to the wave of GSOC students, here's how you can quickly jump in and work on a PR.

Using mlpack/mlpack is probably even better for hitting both use cases.

Though, if you're okay with adding in a new Docker image, we may want to create a custom docker image that is based on the default devcontainer image to lower storage (the base image is free for storage on GitHub and includes a lot of languages)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, piece of cake to add another image. Here's the existing Dockerfile for mlpack/mlpack; do you want to adapt that? Or I can take a shot at it if you like. Once we have something ready I can push it to Dockerhub.

I think it might be a good idea to not install mlpack in whatever container we use; I'm imagining a situation where the user modifies some core mlpack code (in wherever the mlpack source code is checked out), but now that code differs from what's in /usr/include/ (the installed version) and strange linking or behavior issues happen when trying to use mlpack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coatless let me know what you think on this point, I'm happy to help with a new Dockerfile. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcurtin I'll take care of it later today in a PR over in the mlpack/jenkins-conf repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's take the discussion on the Universal base over to: mlpack/jenkins-conf#30

Short gist: I need to lobotomize the container to build it.


// Switch to using root as a remote user
// details: https://aka.ms/dev-containers-non-root.
// "remoteUser": "root"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the preference here? I don't know about devcontainers, but should we be using the root user in the container? (Easy enough to adapt the relevant dockerfiles if so.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity and to avoid folks wondering where user mlpack or jenkins was created, we probably want root.

coatless and others added 2 commits February 26, 2024 10:36
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Copy link

mlpack-bot bot commented Apr 6, 2024

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@coatless
Copy link
Contributor Author

The underlying docker image was updated and now has support for C++/Python/R Bindings to be built.

I left out Julia and Go as I'm not familiar with them.

The codespace built ontop of the new docker container attempting to build the R binary for mlpack

I suppose the next part is increasing the speed of the build process.

@rcurtin
Copy link
Member

rcurtin commented Jul 24, 2024

Looking good to me---a couple questions:

  • Do we want to rebase on the impending mlpack/codespace Docker image from Initial Dockerfile for codespaces... jenkins-conf#30 before or after merge? It seems like it would be a good idea to use that but I'm not picky about the ordering if you want to get this merged first.

  • How should we document this? A badge in the README or a line there or something? Maybe a note in the developer documentation like doc/developer/community.md or similar? I guess that Codespaces also have a cost associated with them, so we might want to mention that too?

If you're happy with this, I'm happy to merge it, it certainly seems like everything works correctly.

The build process is unfortunately always going to be pretty slow... maybe there are some improvements that could be made, but it would be difficult to improve mlpack compilation speed in general.

@coatless
Copy link
Contributor Author

@rcurtin

After.

With mlpack/jenkins-conf#30, the image must be uploaded onto the mlpack org's dockerhub account. From there, we can switch it from jamesbalamuta/mlpack-codespace:latest to mlpack/mlpack-codespace:latest. Though, you may want to reduce image size by offering binding specific spaces, e.g. mlpack-codespace-cpp, mlpack-codespace-python, et cetera... I don't have push privileges on the docker org so either you or @zoq will need to push the image.

With that in mind, we may want to remove the VARIANT approach as a single codespace image exists.

For documentation, how about:

  1. A quick section note on the README.md with the open in codespaces button and a link to a video?
  2. README.md for the container itself (we can trigger files to auto-open using openFiles)
  3. Adding a line into the community or getting started page.

For prose on the cost, maybe:

You can try out the Codespace by clicking on the following button:

Open in GitHub Codespaces

Note: Codespaces are available to Students and Teachers for free up to 180 core hours per month through GitHub Education. Otherwise, you will have up to 60 core hours and 15 GB free per month.

@rcurtin
Copy link
Member

rcurtin commented Sep 2, 2024

Everything you proposed looks great to me. 👍 Sorry it took so long to get back to this! It was a busy summer for me and I am glad it is over...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a devcontainer.json
3 participants