Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-6101] Add first batch of the charm #3

Merged
merged 14 commits into from
Jan 13, 2025
Merged

Conversation

phvalguima
Copy link
Collaborator

@phvalguima phvalguima commented Dec 19, 2024

Adds the first batch of the charm. This PR should be considered once the py wrapper and README have been merged.

It includes:

  • the benchmark lib: for now, it is added as a folder within src: src/benchmark
  • the charm itself: is composed of any of the src/*.py files

Once we have merged all the changes for this charm, we can break the lib into a separate repo. That would mean essentially src/benchmark/* + templates/ would go away to a different repository, alongside its corresponding tests.

Copy link

@deusebio deusebio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is quite well organized and structured. I have really appreciated in general that there are a number of re-usable concepts and code in the benchmark submodule

(issue) I only have a couple of general issues I'd like to raise sooner rather than later as they may require a bit of restructure and re-organization, before doing a more detailed review. In general I would nudge here to the general structure/pattern a bit more. Some points (in order of importance):

  1. We break the clean dependency structure core > managers > handlers in a number of places
  2. I'd like to understand better the Charm inheritance pattern and what are the driving reasons for it. It seems to me that we could just use the usual handlers but I may be missing some points/reasons
  3. I believe it would be also good to split core, managers and handlers in the custom code too. Right now we are doing this for the benchmark module, but I'd like to do this also for the Kafka specifics bits

Copy link

@marcoppenheimer marcoppenheimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done a first pass.
I think it's probably too big a beast to be more detailed right now. I think what we can do is focus on addressing the current comments now, and 'merging' this as is.
Then, I'll set some time up for me to go through the code line by line, open a PR with thoughts in the diff and we can discuss the rest there, what do you think?

source: .
override-build: |
# Ship the charm contents
curl -sSL https://code.launchpad.net/~pguimaraes/openmessaging-benchmark/+git/openmessaging-benchmark/+artifact/280249/+files/openmessaging-benchmark-0.0.1-ubuntu0-20241119152607-linux-x64.tar.gz | tar -zxvf -

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: What's the plan for long-term housing of this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea is to keep it as a launchpad project, just like other of our build from source. @deusebio wdyt?

Comment on lines +2 to +6
test_name:
default: ""
type: string
description: |
Used to identify the test. MUST NOT be empty.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: If this needs to be enforced, have a look at what we do with StructuredConfig on the Kafka charm for config validation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon further review of this file, you should definitely implement validators for lots of these config options imo.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoppenheimer can we break this into a next PR? There are other changes that will be needed, e.g. add a lib/ folder and its content.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we break this into a follow-up PR? Added as a TODO here: #7

Comment on lines +239 to +260
def get(self) -> DPBenchmarkBaseDatabaseModel | None:
"""Returns the value of the key."""
if not self.relation or not (endpoints := self.remote_data.get("endpoints")):
return None

unix_socket = None
if endpoints.startswith("file://"):
unix_socket = endpoints[7:]
try:
return DPBenchmarkBaseDatabaseModel(
hosts=endpoints.split(),
unix_socket=unix_socket,
username=self.data.get("username"),
password=self.data.get("password"),
db_name=self.remote_data.get(self.database_key),
tls=self.tls,
tls_ca=self.tls_ca,
)
except error_wrappers.ValidationError as e:
logger.warning(f"Failed to validate the database model: {e}")
entries = [entry.get("loc")[0] for entry in e.errors()]
raise DPBenchmarkMissingOptionsError(f"{entries}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I was initially confused about the get here, and it got me thinking.
Given that this is so coupled with DPBenchMarkBaseDatabaseModel, would it make more sense to remove the Pydantic model entirely, and just have those attributes part of this object here?

The reason I mentioned it:

db_state = DatabaseState(*args, **kwargs)

# strange access pattern
username = db_state.get().username

# seems easier
username = db_state.username

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still raise on access if necessary, or just return a Falsey/None.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, here is my take:

  1. You can still do: db_state = DatabaseState(*args, **kwargs), but you have to take into account the ValidationError, as expected
  2. The get() adds a new feature: it converts this exception into sth more "acceptable" for the reminder of the code
  3. The username = db_state.username would mean a bunch of properties and setters. I feel like some 20 lines of code will end up into some 60

I am a bit uncomfortable removing this logic from a pydantic model.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it for now. I think that the ValidationError should be raised on CharmBase.__init__ as part of the structured_config mentioned in a previous comment.
AKA - if the config is borked, the charm just fails. I don't think we should be validating these things during event execution, so we wouldn't need this strange access pattern.

As you mentioned, we can look at structured_config in another PR, but when we do, come back to here and see if it still makes sense?

) -> str | None:
"""Executes a command on the workload substrate.

Returns None if the command failed to be executed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: I think returning None if it didn't execute is wrong. Commands with no stdout would return "", which is also falsey, meaning that we couldn't pick up on failures if we do something like workload.exec(["ls", ">", "/dev/null", "2>&1"])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow... What matters here is the returncode as the output is actually sent to log files.

Copy link

@deusebio deusebio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a look. My main concern with the unclear dependency tree is resolved, so I'm happy to approve. I provide still a couple of suggestions, but these are not super critical items, and they can also followed up in separate PRs, as you prefer. If these are things we agree to follow up (but outside of the scope of this PR), please place an improvement item in Jira and refer to the Jira ticket in the comment and/or in the code (such that it does not fall through the cracks).

parallel_processes=self.config.get("parallel_processes"),
threads=self.config.get("threads"),
duration=self.config.get("duration"),
run_count=self.config.get("run_count"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion I would rather use some more structured representation instead of dict. The parsing of these should not be done within a manager, but we should rather use dataclass/pydantic models. For configs, in BigData we use the StructuredConfig pattern.

It can also be addressed in a separate PR though.

max.partition.fetch.bytes=10485760
"""

KAFKA_WORKLOAD_PARAMS_TEMPLATE = """name: {{ partitionsPerTopic }} producer / {{ partitionsPerTopic }} consumers on 1 topic

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestions, minor maybe we could use proper template, like YAML static files in a resouce folder, such that these are even a bit more readable, rather than embedded in the code as strings.

Copy link

@marcoppenheimer marcoppenheimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! As long as the outstanding comments are addressed in the next PR, I'm really happy with how this is looking 👍🏾

@phvalguima phvalguima merged commit 030235e into main Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants