Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCCSD operator pool correction and integration in adapt simulator #65

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

kvmto
Copy link
Collaborator

@kvmto kvmto commented Jan 30, 2025

UCCSD operator pool correction and integration in adapt simulator (Issue #28)

This PR fixes #28 .

Problem

Adapt VQE did not provide correct results. The problem could be the creation of the UCCSD operators pool.

Changes

Changes so far have been applied to:

  • The UCCSD operator pool 'generate' function.
  • The Testing of the UCCSD operator pool, cpp and py files.
  • Tested adap_vqe for final results.

Additional notes

Further testing is needed for making sure that adapt VQE works correctly, especially in terms of molecules.

@kvmto kvmto requested a review from amccaskey January 30, 2025 13:40
@amccaskey
Copy link
Collaborator

Let's add a test to test_adapt.cpp that mimics the checkSimpleAdapt but uses the uccsd operator instead of the spin_complement_gsd one. Note that you'll need to update the coefficients on the operators to make them imaginary, since we assume that in adapt_simulator.cpp (line 152).

@kvmto kvmto requested a review from amccaskey January 31, 2025 13:52
@bmhowe23
Copy link
Collaborator

I would recommend adding the phrase "fixes #28" or "resolves #28" to the description in order to automatically link this GitHub PR with the issue. If you do this, then when the PR is merged, the issue will automatically be closed.

Ref: https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword

@bmhowe23
Copy link
Collaborator

Let's add a test to test_adapt.cpp that mimics the checkSimpleAdapt but uses the uccsd operator instead of the spin_complement_gsd one. Note that you'll need to update the coefficients on the operators to make them imaginary, since we assume that in adapt_simulator.cpp (line 152).

I think I'm missing something ... why we are updating the generic UCCSD operator pools from real to imaginary to accommodate a single assumption in adapt_simulator.cpp? Are any docs updates needed for this change?

@bmhowe23 bmhowe23 requested a review from marwafar January 31, 2025 23:12
@marwafar
Copy link
Collaborator

The uccsd operator pool is correct. However, it might not be a good idea to multiply the imaginary number "i" by the uccsd operator pool. It will not make it generic. We might use this uccsd operator pool with other algorithm such as GQE and we will need to modify this again. I would suggest multiply the imaginary number "i" somewhere within the adapt-vqe algorithm.

In addition, I would suggest to run the adapt-vqe for these two geometry with sto3g basis.

H 0.0 0.0 0.0
H 0.0 0.0 0.7474
Li  0.0  0.0  0.0
H   0.0  0.0  1.5 

We need to make sure that it work for something beyond the simple H2 molecule with 4 qubits and 3 operator pools.

annagrin and others added 4 commits February 3, 2025 20:07
### Fix some issues to allow compilation of `uccsd`

- Fix signed int overflow in `uccsd`
- allow specifying shots in `vqe`

Towards: NVIDIA/cuda-quantum#2357

---------

Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: kvmto <[email protected]>
@kvmto kvmto changed the title Fix issue 28 UCCSD operator pool correction and integration in Adapt Simulator Feb 3, 2025
@kvmto kvmto changed the title UCCSD operator pool correction and integration in Adapt Simulator UCCSD operator pool correction and integration in adapt simulator Feb 4, 2025
@kvmto kvmto requested a review from bmhowe23 February 21, 2025 23:07
const auto &c = pool[0].begin()->get_coefficient();
bool isImaginary = (c.real() == 0.0 && c.imag() != 0.0);

if (!isImaginary) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could be simplified to something like

auto coeff = isImaginary ? std::complex<double>({0., 1.}) : std::complex<double>{1.,0.}; 
for (auto& op : pool) 
  commutators.push_back(H * (coeff * op) - (coeff * op) * H); 

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I tried in the past, but it does not work.
This works : commutators.push_back(coeff * (H * op - op * H));

'YZXI', 'XZYI', 'IYZX', 'IXZY', 'XXXY', 'XXYX', 'XYYY', 'YXYY', 'XYXX',
'YXXX', 'YYXY', 'YYYX'
]
expected_pool = ["XZYIYZXI", "IXZYIYZX", "XXXYXXYXXYXXXYYYYXXXYXYYYYXYYYYX"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The op.to_string(False) is concatenating all the terms together into one string. I think it would be better to ensure that the terms are correct instead of this concatenated string.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I heard @marwafar also ask for this, so should this conversation be reopened?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some local changes that didn't get pushed, now they should be here as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any change. They are still the same ["XZYIYZXI", "IXZYIYZX", "XXXYXXYXXYXXXYYYYXXXYXYYYYXYYYYX"]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I pushed the cpp and not the py, I will push it with the rest at this point. The test identical.

Copy link
Collaborator

@bmhowe23 bmhowe23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to get this to run in my environment (probably something wrong in my environment), but here are some initial comments.

std::vector<double> roundedGradients(gradients.size());
for (size_t i = 0; i < gradients.size(); ++i) {
// Round to a specific precision (e.g., 10 decimal places)
roundedGradients[i] = std::round(gradients[i] * 1e16) / 1e16;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here? Also, the code seems to conflict with the comments (16 digits vs 10 digits). I'm hoping this is old debugging code and no longer needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rounding is necessary to avoid some repeatability issues. The 10th digits was the initial rounding, but then decided to go up again, I will put it back to 10. Thanks for spotting this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If repeatability issues exist, this will not prevent them. One can always be right on the edge of a rounding decision (i.e. 0.100000000500001 and 0.100000000499999 are 2e-15 apart but will still round to different numbers when rounding to 10 digits), so it would be better to leave this out.

I am ok with non-repeatabilities as long as they go away when we run with OMP_NUM_THREADS=1.

'YZXI', 'XZYI', 'IYZX', 'IXZY', 'XXXY', 'XXYX', 'XYYY', 'YXYY', 'XYXX',
'YXXX', 'YYXY', 'YYYX'
]
expected_pool = ["XZYIYZXI", "IXZYIYZX", "XXXYXXYXXYXXXYYYYXXXYXYYYYXYYYYX"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I heard @marwafar also ask for this, so should this conversation be reopened?

@marwafar
Copy link
Collaborator

Yes. This concatenation should change. You should have it like [['XZY', 'YZX'], ['IXZY', 'IYZX'], ['XXYX', 'XXXY', 'XYYY', 'YYXY', 'XYXX', 'YXYY', 'YYYX', 'YXXX']] . You should make sure adapt_vqe() function returns final parameters, selected pool, and final energy. . For example:
Final parameters: [0.11278279920936399]
Selected pools: [['XXYX', 'XXXY', 'XYYY', 'YYXY', 'XYXX', 'YXYY', 'YYYX', 'YXXX']]
Final energy: -1.1372838344885003

Comment on lines 94 to 96
if (step >= 30) {
printf("TIMED OUT\n");
break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 30 should be a parameter to this function, something like maxIter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can put that as a configuration options, together with the tolerance on the energy difference between iterations. If it is okay I will do that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds good to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is max iteration, we can add a default value but allow user to change this. Same for the tolerance of the energy and the tolerance for the norm of gradients.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the initial theta parameter default is zero. But, we should also let user change that if they want.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also when simulation reached the maxIter, instead of only "printf("TIMED OUT\n")" You should also print or save in a file the last parameters, state , selected pools, and energy. User can later enhance the state with other classical simulation in classical post-processing Or we might also later allow user to restart simulation. Allow user to restart simulation will be a nice feature that we should add.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also when simulation reached the maxIter, instead of only "printf("TIMED OUT\n")" You should also print or save in a file the last parameters, state , selected pools, and energy. User can later enhance the state with other classical simulation in classical post-processing Or we might also later allow user to restart simulation. Allow user to restart simulation will be a nice feature that we should add.

These sound like nice features to add, but probably not part of this PR. For this PR, I think returning/printing an error if it runs longer than a new maxIter parameter should be sufficient.

@@ -12,19 +12,67 @@
#include "cudaq.h"
#include "nvqpp/test_kernels.h"
#include "cudaq/solvers/adapt.h"
#include "cudaq/solvers/operators.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run the tests in this file w/ CUDAQ_LOG_LEVEL=info and just watch the stuff scroll by, there are lots of things that look really empty, like this:

[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1204] Setting current circuit name to void (cudaq::state&)
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1024] Allocating 12 new qubits.
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 0 deallocation
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 1 deallocation
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 2 deallocation
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 3 deallocation
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 4 deallocation
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 5 deallocation
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 6 deallocation
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 7 deallocation
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 8 deallocation
[2025-03-13 19:50:21.963] [info] [CircuitSimulator.h:1045] Deferring qubit 9 deallocation
[2025-03-13 19:50:21.963] [info] [CircuitSimulator.h:1045] Deferring qubit 10 deallocation
[2025-03-13 19:50:21.963] [info] [CircuitSimulator.h:1045] Deferring qubit 11 deallocation
[2025-03-13 19:50:22.053] [info] [CircuitSimulator.h:1189] Deallocated all qubits, reseting state vector.
[2025-03-13 19:50:22.105] [info] [CircuitSimulator.h:1204] Setting current circuit name to void (cudaq::state&)
[2025-03-13 19:50:22.105] [info] [CircuitSimulator.h:1024] Allocating 12 new qubits.
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 0 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 1 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 2 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 3 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 4 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 5 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 6 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 7 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 8 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 9 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 10 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 11 deallocation
[2025-03-13 19:50:22.200] [info] [CircuitSimulator.h:1189] Deallocated all qubits, reseting state vector.
...

Is it possible we're running a bunch of empty circuits somehow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log since it repeats seems like the usage of the adapt_kernel, since it used iteratively during the optimization steps.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like they are all coming from

observe_async(qpuCounter++, prepare_state, commutators[i], state));
(the prepare_state kernel).

It looks like that code is assuming that all the state vectors from all the commutators will fit in GPU memory. That seems like a dubious assumption. (This was pre-existing, not new for this PR.)

Copy link
Collaborator Author

@kvmto kvmto Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The component running empty circuits is "prepare_state". It just allocates and deallocates the qubits.

__qpu__ void prepare_state(cudaq::state &state) { cudaq::qvector q{state}; }

My guess is that in order to always run useful circuits, it could could be useful to remove the get_states, and use only adapt_kernel with the various observe

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to skip the two get_state and see if i can get a boost,
but they seem to be redundant at the moment :)
I need a moment to address everything.

Copy link
Collaborator Author

@kvmto kvmto Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmhowe23 any idea why it is not possible to do :

      resultHandles.emplace_back(
          observe_async(qpuCounter++, adapt_kernel, commutators[i], numQubits, initialState, thetas, coefficients, pauliWords, poolIndices));

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this a bit more. It looks like it is indeed applying the commutators[i] spin op at the end of each circuit (via https://github.com/NVIDIA/cuda-quantum/blob/50066dd99025c4eae2af150c327664328b8249cc/runtime/nvqir/custatevec/CuStateVecCircuitSimulator.cpp#L548) ... there are no logs in that function, so reading the logs may have led us down a wild goose chase here.

That being said - we may improve the efficiency by removing the terms that have 0-valued coefficients inside the commutators.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just done.:)
I am trying to avoid the double calculation of the state, do you happen to have an answer to the observe_async?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmhowe23 any idea why it is not possible to do :

      resultHandles.emplace_back(
          observe_async(qpuCounter++, adapt_kernel, commutators[i], numQubits, initialState, thetas, coefficients, pauliWords, poolIndices));

Not off the top of my head ... what happens when you try that?

kvmto and others added 3 commits March 13, 2025 23:02
…oa_operator_pool.h

Co-authored-by: Ben Howe <[email protected]>
Signed-off-by: Kevin Mato <[email protected]>
…oa_operator_pool.h

Co-authored-by: Ben Howe <[email protected]>
Signed-off-by: Kevin Mato <[email protected]>
…csd_operator_pool.h

Co-authored-by: Ben Howe <[email protected]>
Signed-off-by: Kevin Mato <[email protected]>
@@ -15,13 +16,20 @@ adapt_kernel(std::size_t numQubits,
const cudaq::qkernel<void(cudaq::qvector<> &)> &statePrep,
const std::vector<double> &thetas,
const std::vector<double> &coefficients,
const std::vector<cudaq::pauli_word> &trotterOpList) {
const std::vector<cudaq::pauli_word> &trotterOpList,
const std::vector<std::size_t> poolIndices) {
cudaq::qvector q(numQubits);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this line the reason why we see the allocation and deallocation of qubits in the log? @bmhowe23

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would explain the allocation and deallocation, but I'm confused why I don't see any gates occurring. Could we be calling this kernel with an empty trotterOpList?

auto coeff = (!isImaginary) ? std::complex<double>{0.0, 1.0}
: std::complex<double>{1.0, 0.0};
for (auto &op : pool)
commutators.push_back(coeff * (H * op - op * H));
Copy link
Collaborator

@bmhowe23 bmhowe23 Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amccaskey @kvmto @marwafar - there are a significant number of terms in these spin ops that have coefficients = 0. Is that expected? I could be mistaken, but I suspect that means some wasted computation down below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the spin op in the commutator? Yes some some spin op in the commutator operator will be zero. Because of symmetry, some excitation are not allowed. Those will be zero. We should write a function after computing the commutator to remove terms with zero coefficient.

Comment on lines +68 to +70
const auto &c = pool[0].begin()->get_coefficient();
bool isImaginary =
(std::abs(c.real()) <= 1e-9) && (std::abs(c.imag()) > 1e-9);
Copy link
Collaborator

@bmhowe23 bmhowe23 Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that c will never be 0+0j? If it is, then this isImaginary variable would potentially be incorrect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coefficients of operator pool will not be zero. They are either one or less than one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coefficients of operator pool will not be zero. They are either one or less than one.

I believe c is being set to the first term of the first spin op in the pool here. If any of the terms in the spin op can have 0-valued coefficients, then I believe this could be 0. I could be misunderstanding that, though ... looking for @kvmto to confirm that c is never 0 here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm it is never 0 :)

@@ -140,16 +171,22 @@ simulator::run(const cudaq::qkernel<void(cudaq::qvector<> &)> &initialState,
}

// Convergence is reached if gradient values are small
if (std::sqrt(std::fabs(norm)) < tol || std::fabs(lastNorm - norm) < tol)
if (std::sqrt(std::fabs(norm)) < tol || std::fabs(lastNorm - norm) < tol ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not use the same tol value for norm and lastNorm-norm. You should provide different variable names. Also, you should allow user to change those threshold value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, replace fabs(norm)) with just norm, norm is already the absolute value of the gradient vector

@@ -140,16 +171,22 @@ simulator::run(const cudaq::qkernel<void(cudaq::qvector<> &)> &initialState,
}

// Convergence is reached if gradient values are small
if (std::sqrt(std::fabs(norm)) < tol || std::fabs(lastNorm - norm) < tol)
if (std::sqrt(std::fabs(norm)) < tol || std::fabs(lastNorm - norm) < tol ||
ediff < 1e-6)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1e-6 should be variable name like thresholdE. Add a default value but allow user to change these value if they want

break;

// Use the operator from the pool
auto op = pool[maxOpIdx];
if (!isImaginary)
op = std::complex<double>{0.0, 1.0} * pool[maxOpIdx];

chosenOps.push_back(op);
thetas.push_back(0.0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default value should be zero. But allow user to change this too. Instead of "thetas.push_back(0.0);", you can do "thetas.push_back(initTheta)" and initTheta=0.0 default

@@ -101,15 +124,23 @@ simulator::run(const cudaq::qkernel<void(cudaq::qvector<> &)> &initialState,
double norm = 0.0;
for (auto &g : gradients)
norm += g * g;
norm = std::sqrt(norm);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is not clean. You are returning in line 121 the absolute value of the gradient value, then compute norm \sqrt(\sum_i (grad_i)**2).
I think you should do that and it will not change any result. return the gradient value and not the absolute value. Then compute norm as abs (\sqrt(\sum_i (grad_i)**2)). This is how we compute norm of a vector.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need the absolute values in order to pick the max_element, somewhere I have to do the std::abs. If the order of operations does not bother I would keep it as is.

coefficients, pauliWords);
coefficients, pauliWords, poolIndices);

if (groundEnergy < latestEnergy) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They will be always less. This if statement is redundant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fix uccsd pool operator for the adapt-vqe
5 participants