-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCCSD operator pool correction and integration in adapt simulator #65
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: kvmto <[email protected]>
Signed-off-by: kvmto <[email protected]>
libs/solvers/lib/operators/operator_pools/uccsd_operator_pool.cpp
Outdated
Show resolved
Hide resolved
Let's add a test to |
Signed-off-by: kvmto <[email protected]>
Signed-off-by: kvmto <[email protected]>
I think I'm missing something ... why we are updating the generic UCCSD operator pools from real to imaginary to accommodate a single assumption in adapt_simulator.cpp? Are any docs updates needed for this change? |
The uccsd operator pool is correct. However, it might not be a good idea to multiply the imaginary number "i" by the uccsd operator pool. It will not make it generic. We might use this uccsd operator pool with other algorithm such as GQE and we will need to modify this again. I would suggest multiply the imaginary number "i" somewhere within the adapt-vqe algorithm. In addition, I would suggest to run the adapt-vqe for these two geometry with sto3g basis.
We need to make sure that it work for something beyond the simple H2 molecule with 4 qubits and 3 operator pools. |
### Fix some issues to allow compilation of `uccsd` - Fix signed int overflow in `uccsd` - allow specifying shots in `vqe` Towards: NVIDIA/cuda-quantum#2357 --------- Signed-off-by: Anna Gringauze <[email protected]>
Signed-off-by: kvmto <[email protected]>
Signed-off-by: kvmto <[email protected]>
Signed-off-by: kvmto <[email protected]>
Signed-off-by: kvmto <[email protected]>
Signed-off-by: kvmto <[email protected]>
Signed-off-by: kvmto <[email protected]>
Signed-off-by: kvmto <[email protected]>
Signed-off-by: kvmto <[email protected]>
Signed-off-by: kvmto <[email protected]>
const auto &c = pool[0].begin()->get_coefficient(); | ||
bool isImaginary = (c.real() == 0.0 && c.imag() != 0.0); | ||
|
||
if (!isImaginary) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this could be simplified to something like
auto coeff = isImaginary ? std::complex<double>({0., 1.}) : std::complex<double>{1.,0.};
for (auto& op : pool)
commutators.push_back(H * (coeff * op) - (coeff * op) * H);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I tried in the past, but it does not work.
This works : commutators.push_back(coeff * (H * op - op * H));
'YZXI', 'XZYI', 'IYZX', 'IXZY', 'XXXY', 'XXYX', 'XYYY', 'YXYY', 'XYXX', | ||
'YXXX', 'YYXY', 'YYYX' | ||
] | ||
expected_pool = ["XZYIYZXI", "IXZYIYZX", "XXXYXXYXXYXXXYYYYXXXYXYYYYXYYYYX"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The op.to_string(False) is concatenating all the terms together into one string. I think it would be better to ensure that the terms are correct instead of this concatenated string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I heard @marwafar also ask for this, so should this conversation be reopened?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some local changes that didn't get pushed, now they should be here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see any change. They are still the same ["XZYIYZXI", "IXZYIYZX", "XXXYXXYXXYXXXYYYYXXXYXYYYYXYYYYX"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I pushed the cpp and not the py, I will push it with the rest at this point. The test identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still trying to get this to run in my environment (probably something wrong in my environment), but here are some initial comments.
libs/solvers/include/cudaq/solvers/operators/operator_pools/qaoa_operator_pool.h
Outdated
Show resolved
Hide resolved
libs/solvers/include/cudaq/solvers/operators/operator_pools/qaoa_operator_pool.h
Outdated
Show resolved
Hide resolved
libs/solvers/include/cudaq/solvers/operators/operator_pools/spin_complement_gsd.h
Outdated
Show resolved
Hide resolved
libs/solvers/include/cudaq/solvers/operators/operator_pools/uccsd_operator_pool.h
Outdated
Show resolved
Hide resolved
std::vector<double> roundedGradients(gradients.size()); | ||
for (size_t i = 0; i < gradients.size(); ++i) { | ||
// Round to a specific precision (e.g., 10 decimal places) | ||
roundedGradients[i] = std::round(gradients[i] * 1e16) / 1e16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this here? Also, the code seems to conflict with the comments (16 digits vs 10 digits). I'm hoping this is old debugging code and no longer needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rounding is necessary to avoid some repeatability issues. The 10th digits was the initial rounding, but then decided to go up again, I will put it back to 10. Thanks for spotting this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If repeatability issues exist, this will not prevent them. One can always be right on the edge of a rounding decision (i.e. 0.100000000500001 and 0.100000000499999 are 2e-15 apart but will still round to different numbers when rounding to 10 digits), so it would be better to leave this out.
I am ok with non-repeatabilities as long as they go away when we run with OMP_NUM_THREADS=1
.
'YZXI', 'XZYI', 'IYZX', 'IXZY', 'XXXY', 'XXYX', 'XYYY', 'YXYY', 'XYXX', | ||
'YXXX', 'YYXY', 'YYYX' | ||
] | ||
expected_pool = ["XZYIYZXI", "IXZYIYZX", "XXXYXXYXXYXXXYYYYXXXYXYYYYXYYYYX"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I heard @marwafar also ask for this, so should this conversation be reopened?
Yes. This concatenation should change. You should have it like [['XZY', 'YZX'], ['IXZY', 'IYZX'], ['XXYX', 'XXXY', 'XYYY', 'YYXY', 'XYXX', 'YXYY', 'YYYX', 'YXXX']] . You should make sure adapt_vqe() function returns final parameters, selected pool, and final energy. . For example: |
if (step >= 30) { | ||
printf("TIMED OUT\n"); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 30
should be a parameter to this function, something like maxIter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can put that as a configuration options, together with the tolerance on the energy difference between iterations. If it is okay I will do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This is max iteration, we can add a default value but allow user to change this. Same for the tolerance of the energy and the tolerance for the norm of gradients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the initial theta parameter default is zero. But, we should also let user change that if they want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also when simulation reached the maxIter, instead of only "printf("TIMED OUT\n")" You should also print or save in a file the last parameters, state , selected pools, and energy. User can later enhance the state with other classical simulation in classical post-processing Or we might also later allow user to restart simulation. Allow user to restart simulation will be a nice feature that we should add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also when simulation reached the maxIter, instead of only "printf("TIMED OUT\n")" You should also print or save in a file the last parameters, state , selected pools, and energy. User can later enhance the state with other classical simulation in classical post-processing Or we might also later allow user to restart simulation. Allow user to restart simulation will be a nice feature that we should add.
These sound like nice features to add, but probably not part of this PR. For this PR, I think returning/printing an error if it runs longer than a new maxIter
parameter should be sufficient.
@@ -12,19 +12,67 @@ | |||
#include "cudaq.h" | |||
#include "nvqpp/test_kernels.h" | |||
#include "cudaq/solvers/adapt.h" | |||
#include "cudaq/solvers/operators.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run the tests in this file w/ CUDAQ_LOG_LEVEL=info
and just watch the stuff scroll by, there are lots of things that look really empty, like this:
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1204] Setting current circuit name to void (cudaq::state&)
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1024] Allocating 12 new qubits.
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 0 deallocation
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 1 deallocation
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 2 deallocation
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 3 deallocation
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 4 deallocation
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 5 deallocation
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 6 deallocation
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 7 deallocation
[2025-03-13 19:50:21.962] [info] [CircuitSimulator.h:1045] Deferring qubit 8 deallocation
[2025-03-13 19:50:21.963] [info] [CircuitSimulator.h:1045] Deferring qubit 9 deallocation
[2025-03-13 19:50:21.963] [info] [CircuitSimulator.h:1045] Deferring qubit 10 deallocation
[2025-03-13 19:50:21.963] [info] [CircuitSimulator.h:1045] Deferring qubit 11 deallocation
[2025-03-13 19:50:22.053] [info] [CircuitSimulator.h:1189] Deallocated all qubits, reseting state vector.
[2025-03-13 19:50:22.105] [info] [CircuitSimulator.h:1204] Setting current circuit name to void (cudaq::state&)
[2025-03-13 19:50:22.105] [info] [CircuitSimulator.h:1024] Allocating 12 new qubits.
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 0 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 1 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 2 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 3 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 4 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 5 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 6 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 7 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 8 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 9 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 10 deallocation
[2025-03-13 19:50:22.106] [info] [CircuitSimulator.h:1045] Deferring qubit 11 deallocation
[2025-03-13 19:50:22.200] [info] [CircuitSimulator.h:1189] Deallocated all qubits, reseting state vector.
...
Is it possible we're running a bunch of empty circuits somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log since it repeats seems like the usage of the adapt_kernel, since it used iteratively during the optimization steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like they are all coming from
observe_async(qpuCounter++, prepare_state, commutators[i], state)); |
prepare_state
kernel).
It looks like that code is assuming that all the state vectors from all the commutators will fit in GPU memory. That seems like a dubious assumption. (This was pre-existing, not new for this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The component running empty circuits is "prepare_state". It just allocates and deallocates the qubits.
__qpu__ void prepare_state(cudaq::state &state) { cudaq::qvector q{state}; }
My guess is that in order to always run useful circuits, it could could be useful to remove the get_states, and use only adapt_kernel with the various observe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to skip the two get_state and see if i can get a boost,
but they seem to be redundant at the moment :)
I need a moment to address everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bmhowe23 any idea why it is not possible to do :
resultHandles.emplace_back(
observe_async(qpuCounter++, adapt_kernel, commutators[i], numQubits, initialState, thetas, coefficients, pauliWords, poolIndices));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this a bit more. It looks like it is indeed applying the commutators[i]
spin op at the end of each circuit (via https://github.com/NVIDIA/cuda-quantum/blob/50066dd99025c4eae2af150c327664328b8249cc/runtime/nvqir/custatevec/CuStateVecCircuitSimulator.cpp#L548) ... there are no logs in that function, so reading the logs may have led us down a wild goose chase here.
That being said - we may improve the efficiency by removing the terms that have 0-valued coefficients inside the commutators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just done.:)
I am trying to avoid the double calculation of the state, do you happen to have an answer to the observe_async?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bmhowe23 any idea why it is not possible to do :
resultHandles.emplace_back( observe_async(qpuCounter++, adapt_kernel, commutators[i], numQubits, initialState, thetas, coefficients, pauliWords, poolIndices));
Not off the top of my head ... what happens when you try that?
…oa_operator_pool.h Co-authored-by: Ben Howe <[email protected]> Signed-off-by: Kevin Mato <[email protected]>
…oa_operator_pool.h Co-authored-by: Ben Howe <[email protected]> Signed-off-by: Kevin Mato <[email protected]>
…csd_operator_pool.h Co-authored-by: Ben Howe <[email protected]> Signed-off-by: Kevin Mato <[email protected]>
@@ -15,13 +16,20 @@ adapt_kernel(std::size_t numQubits, | |||
const cudaq::qkernel<void(cudaq::qvector<> &)> &statePrep, | |||
const std::vector<double> &thetas, | |||
const std::vector<double> &coefficients, | |||
const std::vector<cudaq::pauli_word> &trotterOpList) { | |||
const std::vector<cudaq::pauli_word> &trotterOpList, | |||
const std::vector<std::size_t> poolIndices) { | |||
cudaq::qvector q(numQubits); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this line the reason why we see the allocation and deallocation of qubits in the log? @bmhowe23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would explain the allocation and deallocation, but I'm confused why I don't see any gates occurring. Could we be calling this kernel with an empty trotterOpList
?
…in_complement_gsd.h Co-authored-by: Ben Howe <[email protected]> Signed-off-by: Kevin Mato <[email protected]>
auto coeff = (!isImaginary) ? std::complex<double>{0.0, 1.0} | ||
: std::complex<double>{1.0, 0.0}; | ||
for (auto &op : pool) | ||
commutators.push_back(coeff * (H * op - op * H)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amccaskey @kvmto @marwafar - there are a significant number of terms in these spin ops that have coefficients = 0. Is that expected? I could be mistaken, but I suspect that means some wasted computation down below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the spin op in the commutator? Yes some some spin op in the commutator operator will be zero. Because of symmetry, some excitation are not allowed. Those will be zero. We should write a function after computing the commutator to remove terms with zero coefficient.
const auto &c = pool[0].begin()->get_coefficient(); | ||
bool isImaginary = | ||
(std::abs(c.real()) <= 1e-9) && (std::abs(c.imag()) > 1e-9); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that c
will never be 0+0j? If it is, then this isImaginary
variable would potentially be incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The coefficients of operator pool will not be zero. They are either one or less than one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The coefficients of operator pool will not be zero. They are either one or less than one.
I believe c
is being set to the first term of the first spin op in the pool here. If any of the terms in the spin op can have 0-valued coefficients, then I believe this could be 0. I could be misunderstanding that, though ... looking for @kvmto to confirm that c
is never 0 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm it is never 0 :)
@@ -140,16 +171,22 @@ simulator::run(const cudaq::qkernel<void(cudaq::qvector<> &)> &initialState, | |||
} | |||
|
|||
// Convergence is reached if gradient values are small | |||
if (std::sqrt(std::fabs(norm)) < tol || std::fabs(lastNorm - norm) < tol) | |||
if (std::sqrt(std::fabs(norm)) < tol || std::fabs(lastNorm - norm) < tol || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not use the same tol value for norm and lastNorm-norm. You should provide different variable names. Also, you should allow user to change those threshold value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, replace fabs(norm)) with just norm, norm is already the absolute value of the gradient vector
@@ -140,16 +171,22 @@ simulator::run(const cudaq::qkernel<void(cudaq::qvector<> &)> &initialState, | |||
} | |||
|
|||
// Convergence is reached if gradient values are small | |||
if (std::sqrt(std::fabs(norm)) < tol || std::fabs(lastNorm - norm) < tol) | |||
if (std::sqrt(std::fabs(norm)) < tol || std::fabs(lastNorm - norm) < tol || | |||
ediff < 1e-6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1e-6 should be variable name like thresholdE. Add a default value but allow user to change these value if they want
break; | ||
|
||
// Use the operator from the pool | ||
auto op = pool[maxOpIdx]; | ||
if (!isImaginary) | ||
op = std::complex<double>{0.0, 1.0} * pool[maxOpIdx]; | ||
|
||
chosenOps.push_back(op); | ||
thetas.push_back(0.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default value should be zero. But allow user to change this too. Instead of "thetas.push_back(0.0);", you can do "thetas.push_back(initTheta)" and initTheta=0.0 default
@@ -101,15 +124,23 @@ simulator::run(const cudaq::qkernel<void(cudaq::qvector<> &)> &initialState, | |||
double norm = 0.0; | |||
for (auto &g : gradients) | |||
norm += g * g; | |||
norm = std::sqrt(norm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is not clean. You are returning in line 121 the absolute value of the gradient value, then compute norm \sqrt(\sum_i (grad_i)**2).
I think you should do that and it will not change any result. return the gradient value and not the absolute value. Then compute norm as abs (\sqrt(\sum_i (grad_i)**2)). This is how we compute norm of a vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need the absolute values in order to pick the max_element, somewhere I have to do the std::abs. If the order of operations does not bother I would keep it as is.
coefficients, pauliWords); | ||
coefficients, pauliWords, poolIndices); | ||
|
||
if (groundEnergy < latestEnergy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They will be always less. This if statement is redundant
UCCSD operator pool correction and integration in adapt simulator (Issue #28)
This PR fixes #28 .
Problem
Adapt VQE did not provide correct results. The problem could be the creation of the UCCSD operators pool.
Changes
Changes so far have been applied to:
Additional notes
Further testing is needed for making sure that adapt VQE works correctly, especially in terms of molecules.