Skip to content

Commit b760c62

Browse files
committed
perf: reuse XPathContext objects
Issue #3266 discusses a potential optimization by re-using XPathContext objects (which are relatively expensive to initialize). This PR makes the following changes: - the `XPathContext` object ... - can accept `nil` as the value arg to `#register_ns` and `#register_variable` to _deregister_ the namespace or variable - tracks namespaces and variables registered through those two methods - has a new `#reset` method that deregisters all namespaces and variables registered - has a new `#node=` method to set the current node being searched - all the `Searchable` methods ... - use a thread-local XPathContext object that is only available in a block yielded by `#get_xpath_context` - and that context object is `#reset` when the block returns There is an escape hatch that I will leave undocumented, which is to set the env var `NOKOGIRI_DEOPTIMIZE_XPATH`, out of an abundance of caution. Here's a benchmark, where "small" is a 6kb file and "large" is a 70kb file: ``` $ NOKOGIRI_DEOPTIMIZE_XPATH=t ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: normal 3.790k i/100ms Calculating ------------------------------------- large: normal 37.556k (± 1.7%) i/s (26.63 μs/i) - 189.500k in 5.047390s ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: normal 11.726k i/100ms Calculating ------------------------------------- small: normal 113.719k (± 2.5%) i/s (8.79 μs/i) - 574.574k in 5.055648s $ ruby --yjit ./issues/3266-xpath-benchmark.rb ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- large: optimized 4.609k i/100ms Calculating ------------------------------------- large: optimized 48.107k (± 1.6%) i/s (20.79 μs/i) - 244.277k in 5.079041s Comparison: large: optimized: 48107.3 i/s large: normal: 37555.7 i/s - 1.28x slower ruby 3.3.6 (2024-11-05 revision 75015d4c1f) +YJIT [x86_64-linux] Warming up -------------------------------------- small: optimized 32.014k i/100ms Calculating ------------------------------------- small: optimized 319.760k (± 0.6%) i/s (3.13 μs/i) - 1.601M in 5.006140s Comparison: small: optimized: 319759.6 i/s small: normal: 113719.0 i/s - 2.81x slower ``` I originally implemented a much simpler approach, which cleared all registered variables and functions, however the standard XPath 1.0 functions were all also deregistered; and calling `xmlXPathRegisterAllFunctions` to re-register them took us back to the original performance profile. The win here is to avoid re-calling `xmlXPathRegisterAllFunctions`, and for that we track registered variables and namespaces, and de-register them after the query eval completes.
1 parent 3622493 commit b760c62

File tree

6 files changed

+188
-19
lines changed

6 files changed

+188
-19
lines changed

CHANGELOG.md

+5
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,11 @@ This release ships separate precompiled GNU and Musl gems for all linux platform
1919
This release drops precompiled native platform gems for `x86-linux` and `x86-mingw32`. **These platforms are still supported.** Users on these platforms must install the "ruby platform" gem which requires a compiler toolchain. See [Installing the `ruby` platform gem](https://nokogiri.org/tutorials/installing_nokogiri.html#installing-the-ruby-platform-gem) in the installation docs. (#3369, #3081)
2020

2121

22+
### Improved
23+
24+
* CSS and XPath queries are faster now that `Node#xpath`, `Node#css`, and related functions are re-using the underlying xpath context object (which is expensive to initialize). We benchmarked a 2.8x improvement for a 6kb file, and a more modest 1.3x improvement for a 70kb file. (#3378) @flavorjones
25+
26+
2227
## v1.17.2 / 2024-12-12
2328

2429
### Fixed

ext/java/nokogiri/XmlXpathContext.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ public class XmlXpathContext extends RubyObject
179179
final NokogiriXPathFunctionResolver fnResolver = NokogiriXPathFunctionResolver.create(handler);
180180
try {
181181
return tryGetNodeSet(context, expr, fnResolver);
182-
} catch (TransformerException ex) {
182+
} catch (TransformerException | RuntimeException ex) {
183183
throw XmlSyntaxError.createXMLXPathSyntaxError(context.runtime,
184184
(expr + ": " + ex.toString()),
185185
ex).toThrowable();

ext/nokogiri/xml_xpath_context.c

+54-10
Original file line numberDiff line numberDiff line change
@@ -124,20 +124,33 @@ xpath_builtin_local_name_is(xmlXPathParserContextPtr ctxt, int nargs)
124124
* register_ns(prefix, uri) → Nokogiri::XML::XPathContext
125125
*
126126
* Register the namespace with +prefix+ and +uri+ for use in future queries.
127+
* Passing a uri of +nil+ will unregister the namespace.
127128
*
128129
* [Returns] +self+
129130
*/
130131
static VALUE
131132
rb_xml_xpath_context_register_ns(VALUE rb_context, VALUE prefix, VALUE uri)
132133
{
133134
xmlXPathContextPtr c_context;
135+
const xmlChar *ns_uri;
134136

135137
TypedData_Get_Struct(rb_context, xmlXPathContext, &xml_xpath_context_type, c_context);
136138

137-
xmlXPathRegisterNs(c_context,
138-
(const xmlChar *)StringValueCStr(prefix),
139-
(const xmlChar *)StringValueCStr(uri)
140-
);
139+
if (NIL_P(uri)) {
140+
ns_uri = NULL;
141+
} else {
142+
ns_uri = (const xmlChar *)StringValueCStr(uri);
143+
}
144+
145+
xmlXPathRegisterNs(c_context, (const xmlChar *)StringValueCStr(prefix), ns_uri);
146+
147+
VALUE registered_namespaces = rb_iv_get(rb_context, "@registered_namespaces");
148+
if (NIL_P(uri)) {
149+
rb_hash_delete(registered_namespaces, prefix);
150+
} else {
151+
rb_hash_aset(registered_namespaces, prefix, Qtrue);
152+
}
153+
141154
return rb_context;
142155
}
143156

@@ -146,6 +159,7 @@ rb_xml_xpath_context_register_ns(VALUE rb_context, VALUE prefix, VALUE uri)
146159
* register_variable(name, value) → Nokogiri::XML::XPathContext
147160
*
148161
* Register the variable +name+ with +value+ for use in future queries.
162+
* Passing a value of +nil+ will unregister the variable.
149163
*
150164
* [Returns] +self+
151165
*/
@@ -157,13 +171,20 @@ rb_xml_xpath_context_register_variable(VALUE rb_context, VALUE name, VALUE value
157171

158172
TypedData_Get_Struct(rb_context, xmlXPathContext, &xml_xpath_context_type, c_context);
159173

160-
xmlValue = xmlXPathNewCString(StringValueCStr(value));
174+
if (NIL_P(value)) {
175+
xmlValue = NULL;
176+
} else {
177+
xmlValue = xmlXPathNewCString(StringValueCStr(value));
178+
}
161179

162-
xmlXPathRegisterVariable(
163-
c_context,
164-
(const xmlChar *)StringValueCStr(name),
165-
xmlValue
166-
);
180+
xmlXPathRegisterVariable(c_context, (const xmlChar *)StringValueCStr(name), xmlValue);
181+
182+
VALUE registered_variables = rb_iv_get(rb_context, "@registered_variables");
183+
if (NIL_P(value)) {
184+
rb_hash_delete(registered_variables, name);
185+
} else {
186+
rb_hash_aset(registered_variables, name, Qtrue);
187+
}
167188

168189
return rb_context;
169190
}
@@ -394,6 +415,7 @@ rb_xml_xpath_context_evaluate(int argc, VALUE *argv, VALUE rb_context)
394415

395416
xmlSetStructuredErrorFunc(NULL, NULL);
396417
xmlSetGenericErrorFunc(NULL, NULL);
418+
xmlXPathRegisterFuncLookup(c_context, NULL, NULL);
397419

398420
if (xpath == NULL) {
399421
rb_exc_raise(rb_ary_entry(errors, 0));
@@ -447,9 +469,30 @@ rb_xml_xpath_context_new(VALUE klass, VALUE rb_node)
447469
&xml_xpath_context_type,
448470
c_context
449471
);
472+
473+
rb_iv_set(rb_context, "@registered_namespaces", rb_hash_new());
474+
rb_iv_set(rb_context, "@registered_variables", rb_hash_new());
475+
450476
return rb_context;
451477
}
452478

479+
480+
/* :nodoc: */
481+
static VALUE
482+
rb_xml_xpath_context_set_node(VALUE rb_context, VALUE rb_node)
483+
{
484+
xmlNodePtr c_node;
485+
xmlXPathContextPtr c_context;
486+
487+
TypedData_Get_Struct(rb_context, xmlXPathContext, &xml_xpath_context_type, c_context);
488+
Noko_Node_Get_Struct(rb_node, xmlNode, c_node);
489+
490+
c_context->doc = c_node->doc;
491+
c_context->node = c_node;
492+
493+
return rb_node;
494+
}
495+
453496
void
454497
noko_init_xml_xpath_context(void)
455498
{
@@ -465,4 +508,5 @@ noko_init_xml_xpath_context(void)
465508
rb_define_method(cNokogiriXmlXpathContext, "evaluate", rb_xml_xpath_context_evaluate, -1);
466509
rb_define_method(cNokogiriXmlXpathContext, "register_variable", rb_xml_xpath_context_register_variable, 2);
467510
rb_define_method(cNokogiriXmlXpathContext, "register_ns", rb_xml_xpath_context_register_ns, 2);
511+
rb_define_method(cNokogiriXmlXpathContext, "node=", rb_xml_xpath_context_set_node, 1);
468512
}

lib/nokogiri/xml/searchable.rb

+26-8
Original file line numberDiff line numberDiff line change
@@ -261,18 +261,36 @@ def xpath_internal(node, paths, handler, ns, binds)
261261
end
262262

263263
def xpath_impl(node, path, handler, ns, binds)
264-
ctx = get_xpath_context(node)
264+
get_xpath_context(node) do |context|
265+
context.register_namespaces(ns)
266+
context.register_variables(binds)
265267

266-
ctx.register_namespaces(ns)
267-
ctx.register_variables(binds)
268+
path = path.gsub("xmlns:", " :") unless Nokogiri.uses_libxml?
268269

269-
path = path.gsub("xmlns:", " :") unless Nokogiri.uses_libxml?
270-
271-
ctx.evaluate(path, handler)
270+
context.evaluate(path, handler)
271+
end
272272
end
273273

274-
def get_xpath_context(node)
275-
XPathContext.new(node)
274+
if Nokogiri.uses_libxml? && ENV["NOKOGIRI_DEOPTIMIZE_XPATH"].nil? # env var is an escape hatch
275+
# optimized path
276+
def get_xpath_context(node)
277+
context = Thread.current.thread_variable_get(:nokogiri_xpath_context)
278+
if context
279+
context.node = node
280+
else
281+
context = Thread.current.thread_variable_set(:nokogiri_xpath_context, XPathContext.new(node))
282+
end
283+
284+
begin
285+
yield context
286+
ensure
287+
context.reset
288+
end
289+
end
290+
else
291+
def get_xpath_context(node)
292+
yield XPathContext.new(node)
293+
end
276294
end
277295
end
278296
end

lib/nokogiri/xml/xpath_context.rb

+22
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,28 @@ def register_variables(binds)
2222
register_variable(key, value)
2323
end
2424
end
25+
26+
if Nokogiri.uses_libxml?
27+
def reset
28+
return unless
29+
30+
@registered_namespaces.each do |key, _|
31+
register_ns(key, nil)
32+
end
33+
unless @registered_namespaces.empty?
34+
warn "Nokogiri::XML::XPathContext#reset: unexpected registered namespaces: #{@registered_namespaces.keys}"
35+
@registered_namespaces.clear
36+
end
37+
38+
@registered_variables.each do |key, _|
39+
register_variable(key, nil)
40+
end
41+
unless @registered_variables.empty?
42+
warn "Nokogiri::XML::XPathContext#reset: unexpected registered variables: #{@registered_variables.keys}"
43+
@registered_variables.clear
44+
end
45+
end
46+
end
2547
end
2648
end
2749
end

test/xml/test_xpath.rb

+80
Original file line numberDiff line numberDiff line change
@@ -698,6 +698,86 @@ def collision(nodes)
698698
assert_equal(3, doc.xpath("//self::*:child").length)
699699
end
700700
end
701+
702+
describe "re-using XPathContext objects within a thread" do
703+
# see https://github.com/sparklemotion/nokogiri/issues/3266
704+
# this optimization is CRuby-only, but we run the tests on JRuby for consistency
705+
let(:doc) {
706+
Nokogiri::XML::Document.parse(<<~XML)
707+
<root xmlns="http://nokogiri.org/default" xmlns:ns1="http://nokogiri.org/ns1">
708+
<child>default</child>
709+
<ns1:child>ns1</ns1:child>
710+
</root>
711+
XML
712+
}
713+
714+
it "clears registered namespaces" do
715+
# make sure the thread-local XPathContext is initialized
716+
doc.xpath("//xmlns:child")
717+
718+
# do namespaces behave the way we expect?
719+
node = doc.at_xpath("//ns:child", { "ns" => "http://nokogiri.org/ns1" })
720+
assert_equal("ns1", node.text)
721+
722+
assert_raises(XPath::SyntaxError) {
723+
doc.at_xpath("//ns:child")
724+
}
725+
726+
node = doc.at_xpath("//child")
727+
assert_nil(node)
728+
729+
# make sure the nokogiri and nokogiri-builting namespaces are re-registered
730+
doc.xpath("//xmlns:child[nokogiri:thing(.)]", @handler)
731+
assert_equal(1, @handler.things.length)
732+
733+
if Nokogiri.uses_libxml?
734+
nodes = doc.xpath("//xmlns:child[nokogiri-builtin:local-name-is('child')]")
735+
assert_equal(1, nodes.length)
736+
end
737+
end
738+
739+
it "clears registered handlers and functions" do
740+
# make sure the thread-local XPathContext is initialized
741+
doc.xpath("//xmlns:child")
742+
743+
doc.xpath("//xmlns:child[nokogiri:thing(.)]", @handler)
744+
assert_equal(1, @handler.things.length)
745+
746+
assert_raises(XPath::SyntaxError) {
747+
doc.xpath("//xmlns:child[nokogiri:thing(.)]")
748+
}
749+
750+
doc.xpath("//xmlns:child[nokogiri:thing(.)]", @handler)
751+
assert_equal(2, @handler.things.length)
752+
753+
if Nokogiri.uses_libxml?
754+
nodes = doc.xpath("//xmlns:child[nokogiri-builtin:local-name-is('child')]")
755+
assert_equal(1, nodes.length)
756+
end
757+
end
758+
759+
it "clears registered variables" do
760+
# make sure the thread-local XPathContext is initialized
761+
doc.xpath("//xmlns:child")
762+
763+
nodes = @xml.xpath("//address[@domestic=$value]", nil, value: "Yes")
764+
assert_equal(4, nodes.length)
765+
766+
assert_raises(XPath::SyntaxError) {
767+
@xml.xpath("//address[@domestic=$value]")
768+
}
769+
770+
nodes = @xml.xpath("//address[@domestic=$value]", nil, value: "Qwerty")
771+
assert_empty(nodes)
772+
773+
assert_raises(XPath::SyntaxError) {
774+
@xml.xpath("//address[@domestic=$value]")
775+
}
776+
777+
nodes = @xml.xpath("//address[@domestic=$value]", nil, value: "Yes")
778+
assert_equal(4, nodes.length)
779+
end
780+
end
701781
end
702782
end
703783
end

0 commit comments

Comments
 (0)