Skip to content

Conversation

davidkopp
Copy link
Collaborator

@davidkopp davidkopp commented Aug 19, 2025

Resolves issue https://github.com/green-coding-solutions/dependency-resolver/issues/5.

Tasks / Open Questions

Before merging the following tasks / questions should be resolved:

Installation

The used dependency-resolver library is not published yet (here on GitHub and on PyPI)!
I installed it locally using

pip install <path-to-dependency-resolver>

Visualization (Outdated)

EDIT: The information in this section are outdated. See comments below for screenshots of the updated implementation.

With the current implementation, only the container images are collected and displayed in a table on the tab "Usage Scenario" in the frontend:

image

No real dependencies resolving happens at the moment.

Other example:

image

Logs from runner.py:

Waiting for Metric Providers to boot ... 
Stderr check on CpuUtilizationCgroupContainerProvider
Stderr check on MemoryUsedCgroupContainerProvider
 
Collecting dependency information 
Successfully collected dependency information for 3 containers:
  - kadai-postgres: registrygitlabcomenviteconsultingsustainablesoftwarearchitecturekadaikadaidatabasespostgres147100ktasks_gmt_run_tmp:latest (sha256:fe6127e3be15a74cf3b3097c289386e65afc46d41424d2010b1a9580a302c912)
  - kadai-example-app: registrygitlabcomenviteconsultingsustainablesoftwarearchitecturekadaikadaiexamplespringbootkadai1010_gmt_run_tmp:latest (sha256:3b5e7b53918be05c3d17b36cb92f5eb088b1c3732d66d647930a32899133cc2a)
  - k6: k6:1.1.0 (sha256:529bf12eafde3435d413a0a094943abcb2ebffe1aa90e107b5e6093443ff7218)
 
Starting phase [IDLE].

@ArneTR
Copy link
Member

ArneTR commented Aug 19, 2025

The PR integration is at the right places and I also like the integration of the debugger steps.

I am a bit skeptical about the ThreadPool which makes only sense when we have relevant I/O and not compute. Is that the case ? (Not having read the performance profiling yet, so excuse me if that is already answered somewhere).

Also the exception handling is a bit overheady. You except a lot and then later on the processing still continues even if errors are encountered. Is that beneficial? Would it not be better to fail on first exception? What is the rationale here for me to understand the implementation better.

What I need

To really evalute this feature it would be needed that we also test an image where NPM packages (for instance) are installed in a non-pulled (but built ) docker image.
can you provide such a sample usage scenario file with needed Docker files?
Also a question here: is the functionality lacking for this currently or just a sample to test?

@ArneTR
Copy link
Member

ArneTR commented Aug 19, 2025

@gemini-cli /review

@ArneTR ArneTR marked this pull request as ready for review August 19, 2025 08:41
@ArneTR
Copy link
Member

ArneTR commented Aug 19, 2025

@gemini-cli /review

@davidkopp
Copy link
Collaborator Author

davidkopp commented Aug 19, 2025

I am a bit skeptical about the ThreadPool which makes only sense when we have relevant I/O and not compute. Is that the case ? (Not having read the performance profiling yet, so excuse me if that is already answered somewhere).

The performance analysis does not include a measurement of parallel execution yet. I should include that.
However, one of the conclusions of the performance analysis so far was, that the calls from Python to the Docker API for the execution of a command inside the container are a significant bottleneck. Therefore reducing the amount of calls improves the performance a lot. I assume that also the parallel execution should help (I will add a measurement for that later).

EDIT: Performance measurement was done. See results here.

Regarding the exception handling, I gave Claude the following requirements for the implementation:

  • If dependency resolver fails for any container, GMT should continue execution
  • Only store results if ALL containers succeed - partial results should not be stored
  • Print overall summary warning message during execution
  • Display notification in frontend

So yes, it is intentional that the exceptions are caught in the first place.
What should happen in your opinion, if it fails on first exception?

btw: here a screenshot of an error notification in the frontend:
image

To really evalute this feature it would be needed that we also test an image where NPM packages (for instance) are installed in a non-pulled (but built ) docker image.
can you provide such a sample usage scenario file with needed Docker files?
Also a question here: is the functionality lacking for this currently or just a sample to test?

The current implementation only retrieves the container image from Docker, there is no real dependency resolving yet. I think we will need to talk about this requirement in the next meeting.

davidkopp and others added 16 commits August 27, 2025 17:43
Implements parallel dependency collection after BOOT phase with asyncio execution,
10-second timeout per container, and all-or-nothing storage policy. Includes
comprehensive testing with 100% test coverage.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add NOT NULL DEFAULT '{}' constraint to usage_scenario_dependencies columns in structure.sql
- Update migration to handle existing NULL values before applying constraints
- Initialize __usage_scenario_dependencies as empty dict instead of None
- Remove conditional NULL checks when inserting dependency data

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add missing --dev-no-metrics check in _process_phase_stats() to prevent
"Metrics was empty and no phase_stats could be created" error when using
--dev-no-metrics flag. Phase stats cannot be calculated without metrics data.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Updated hardcoded path from incorrect 'gcs' to 'green-coding-solutions'
- Fixed path expansion to use ~/ instead of $HOME for proper resolution
- Enhanced dependency collection output to show container name, image, and full hash

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Replace subprocess calls with direct use of resolve_docker_dependencies_as_dict()
- Improve parallel execution by switching from asyncio to ThreadPoolExecutor
- Remove async/await complexity since dependency resolver function is synchronous
- Fix working_dir parameter usage to prevent path validation errors
- Update tests to match new synchronous implementation
- Remove unused imports and fix linting issues

This change provides better performance, reliability, and maintainability
by eliminating subprocess overhead and timeout issues.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Modify dependency collection to raise RuntimeError when any container fails instead of continuing
- Remove frontend error handling for dependency failures since runs will fail completely
- Update tests to expect RuntimeError on partial dependency resolution failure
- Remove try-catch wrapper that was swallowing dependency resolution failures

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Adds test_integration_dependency_collection_with_real_containers() which:
- Uses RunUntilManager to run actual container workflow
- Tests real dependency_resolver package without mocking
- Validates GMT's image name transformations
- Verifies dependency data structure and content with actual Docker containers

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add container type detection based on 'build' key in usage scenario services
- Built containers get full dependency resolution (dpkg, pip, etc.)
- Pre-built containers get container-info-only resolution for performance
- Update data structure to nested format with _container-info section
- Maintain backward compatibility with existing frontend
- Update frontend to access nested _container-info structure
- Add comprehensive test coverage for both resolution types
- Improve error handling with RuntimeError for invalid data structures
- All tests passing with 10.00/10 pylint score

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Simplify dependency collection output to print full JSON structure
- Add timing measurement and logging for full dependency resolution of built containers
- Remove unrealistic test case for missing container info
- Update integration test to verify database storage of dependencies
- Simplify dependency resolver return logic by removing redundant extraction

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@davidkopp davidkopp force-pushed the add-container-dependency-resolving branch from 204cf81 to 9abe886 Compare August 27, 2025 15:44
@davidkopp
Copy link
Collaborator Author

davidkopp commented Aug 27, 2025

Overview of the changes done since my last comment:

  • simplified exception handling -> fail hard on first error
  • Diff works now (see screenshot below)
  • do full dependency resolution if the container image is build (not pulled)
    • time is measured and printed in the logs
    • integration test added with a Django "Hello World" web application (see tests/data/web-application/)
  • the resulting JSON is printed in the logs

For testing I changed the version of Django in the requirements.txt file of the Django "Hello World" web application. Here the resulting diff:

image

Logs:

image

I'm not happy with the current output format of the dependency_resolver. I created a separate issue for that: https://github.com/green-coding-solutions/dependency-resolver/issues/9

btw: Tests in the pipeline are failing because it can't find the module 'dependency_resolver' (it is not published yet)

- Replace single table with container > scope > dependencies hierarchy
- Add collapsible accordions for project and system dependencies
- Display full container hash, truncated scope hash with tooltip
- Show scope-specific metadata (location for project dependencies)
- Improve visual hierarchy with proper indentation
- Group package managers by scope for better organization

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@davidkopp
Copy link
Collaborator Author

Improved the visualization of the dependencies in the "Usage Scenario" tab:

image

The packages are collapsed by default.
Project and system packages are presented differently:

image image

@ArneTR
Copy link
Member

ArneTR commented Sep 3, 2025

FYI: Due for review tomorrow

@ArneTR
Copy link
Member

ArneTR commented Sep 4, 2025

sadly postponed to next week :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants