home / github

Menu
  • GraphQL API

github

Custom SQL query returning 39 rows (hide)

Query parameters

This data as json, CSV

html_urlissue_urlidnode_idusercreated_atupdated_atauthor_associationbodyreactionsissueperformed_via_github_app
https://github.com/simonw/datasette/issues/448#issuecomment-969557008 https://api.github.com/repos/simonw/datasette/issues/448 969557008 IC_kwDOBm6k_c45ykQQ 9599 2021-11-16T00:56:09Z 2021-11-16T00:59:59Z OWNER This looks like it might work: ```sql with inner as ( select * from ads_with_targets where :p0 in ( select value from json_each([ads_with_targets].[target_names]) ) ), deduped_array_items as ( select distinct j.value, inner.* from json_each([inner].[target_names]) j join inner ) select value, count(*) from deduped_array_items group by value order by count(*) desc ``` {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 440222719  
https://github.com/simonw/datasette/issues/448#issuecomment-969557972 https://api.github.com/repos/simonw/datasette/issues/448 969557972 IC_kwDOBm6k_c45ykfU 9599 2021-11-16T00:56:58Z 2021-11-16T00:56:58Z OWNER It uses a CTE which were introduced in SQLite 3.8 - and AWS Lambda Python 3.9 still provides 3.7 - but I've checked and I can use `pysqlite3-binary` to work around that there so I'm OK relying on CTEs for this. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 440222719  
https://github.com/simonw/datasette/issues/448#issuecomment-969572281 https://api.github.com/repos/simonw/datasette/issues/448 969572281 IC_kwDOBm6k_c45yn-5 9599 2021-11-16T01:05:11Z 2021-11-16T01:05:11Z OWNER I tried this and it seems to work correctly: ```python for source_and_config in self.get_configs(): config = source_and_config["config"] source = source_and_config["source"] column = config.get("column") or config["simple"] facet_sql = """ with inner as ({sql}), deduped_array_items as ( select distinct j.value, inner.* from json_each([inner].{col}) j join inner ) select value as value, count(*) as count from deduped_array_items group by value order by count(*) desc limit {limit} """.format( col=escape_sqlite(column), sql=self.sql, limit=facet_size + 1 ) ``` The queries are _very_ slow though - I had to bump up to 2s time limit even against only a view returning 3,499 rows. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 440222719  
https://github.com/simonw/datasette/issues/448#issuecomment-969578466 https://api.github.com/repos/simonw/datasette/issues/448 969578466 IC_kwDOBm6k_c45ypfi 9599 2021-11-16T01:08:29Z 2021-11-16T01:08:29Z OWNER Actually with the cache warmed up it looks like the facet query is taking 150ms which is good enough. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 440222719  
https://github.com/simonw/datasette/issues/448#issuecomment-969582098 https://api.github.com/repos/simonw/datasette/issues/448 969582098 IC_kwDOBm6k_c45yqYS 9599 2021-11-16T01:10:28Z 2021-11-16T01:10:28Z OWNER Also note that this demo data is using a SQL view to create the JSON arrays - the view is defined as such: ```sql CREATE VIEW ads_with_targets as select ads.*, json_group_array(targets.name) as target_names from ads join ad_targets on ad_targets.ad_id = ads.id join targets on ad_targets.target_id = targets.id group by ad_targets.ad_id; ``` So running JSON faceting on top of that view is a pretty big ask! {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 440222719  
https://github.com/simonw/datasette/issues/1511#issuecomment-969600859 https://api.github.com/repos/simonw/datasette/issues/1511 969600859 IC_kwDOBm6k_c45yu9b 9599 2021-11-16T01:20:13Z 2021-11-16T01:20:13Z OWNER See: - #830 {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1054246919  
https://github.com/simonw/datasette/issues/1012#issuecomment-969602825 https://api.github.com/repos/simonw/datasette/issues/1012 969602825 IC_kwDOBm6k_c45yvcJ 9599 2021-11-16T01:21:14Z 2021-11-16T01:21:14Z OWNER I'd been wondering how to get new classifiers into Trove - thanks, I'll give this a go. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 718540751  
https://github.com/simonw/datasette/issues/1012#issuecomment-969613166 https://api.github.com/repos/simonw/datasette/issues/1012 969613166 IC_kwDOBm6k_c45yx9u 9599 2021-11-16T01:27:25Z 2021-11-16T01:27:25Z OWNER Requested here: - https://github.com/pypa/trove-classifiers/pull/85 {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 718540751  
https://github.com/simonw/datasette/issues/1176#issuecomment-969616626 https://api.github.com/repos/simonw/datasette/issues/1176 969616626 IC_kwDOBm6k_c45yyzy 9599 2021-11-16T01:29:13Z 2021-11-16T01:29:13Z OWNER I'm inclined to create a Sphinx reference documentation page for this, as I did for `sqlite-utils` here: https://sqlite-utils.datasette.io/en/stable/reference.html {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 779691739  
https://github.com/simonw/datasette/issues/448#issuecomment-969621662 https://api.github.com/repos/simonw/datasette/issues/448 969621662 IC_kwDOBm6k_c45y0Ce 9599 2021-11-16T01:32:04Z 2021-11-16T01:32:04Z OWNER Tests are failing and I think it's because the facets come back in different orders, need a tie-breaker. https://github.com/simonw/datasette/runs/4219325197?check_suite_focus=true {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 440222719  
https://github.com/simonw/datasette/issues/1505#issuecomment-970188065 https://api.github.com/repos/simonw/datasette/issues/1505 970188065 IC_kwDOBm6k_c450-Uh 7094907 2021-11-16T11:40:52Z 2021-11-16T11:40:52Z NONE A suggestion is to have the option to choose an arbitrary delimiter (and quoting characters ) {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1052247023  
https://github.com/simonw/datasette/issues/1012#issuecomment-970266123 https://api.github.com/repos/simonw/datasette/issues/1012 970266123 IC_kwDOBm6k_c451RYL 45380 2021-11-16T13:18:36Z 2021-11-16T13:18:36Z CONTRIBUTOR Congratulations, looks like it went through! There was a bit of a hold-up on the JupyterLab ones, but it's semi automated: a dependabot pr to warehouse and a CI deploy, with a click in between. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 718540751  
https://github.com/simonw/datasette/issues/1509#issuecomment-970544733 https://api.github.com/repos/simonw/datasette/issues/1509 970544733 IC_kwDOBm6k_c452VZd 9599 2021-11-16T18:22:32Z 2021-11-16T18:22:32Z OWNER This is mainly happening here: - https://github.com/simonw/datasette/issues/782 {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1054243511  
https://github.com/simonw/datasette/issues/782#issuecomment-970553780 https://api.github.com/repos/simonw/datasette/issues/782 970553780 IC_kwDOBm6k_c452Xm0 9599 2021-11-16T18:30:51Z 2021-11-16T18:30:58Z OWNER OK, I'm ready to start working on this today. I'm going to go with a default representation that looks like this: ```json { "rows": [ {"id": 1, "name": "One"}, {"id": 2, "name": "Two"} ], "next_url": null } ``` Note that there's no `count` - all it provides is the current selection of results and an indication as to how the next can be retrieved (`null` if there are no more results). I'll implement `?_extra=` to provide everything else. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 627794879  
https://github.com/simonw/datasette/issues/782#issuecomment-970554697 https://api.github.com/repos/simonw/datasette/issues/782 970554697 IC_kwDOBm6k_c452X1J 9599 2021-11-16T18:32:03Z 2021-11-16T18:32:03Z OWNER I'm going to take another look at this: - https://github.com/simonw/datasette/issues/878 {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 627794879  
https://github.com/simonw/datasette/issues/878#issuecomment-970624197 https://api.github.com/repos/simonw/datasette/issues/878 970624197 IC_kwDOBm6k_c452ozF 9599 2021-11-16T19:49:05Z 2021-11-16T19:49:05Z OWNER Here's the latest version of my weird dependency injection async class: ```python import inspect class AsyncMeta(type): def __new__(cls, name, bases, attrs): # Decorate any items that are 'async def' methods _registry = {} new_attrs = {"_registry": _registry} for key, value in attrs.items(): if inspect.iscoroutinefunction(value) and not value.__name__ == "resolve": new_attrs[key] = make_method(value) _registry[key] = new_attrs[key] else: new_attrs[key] = value # Topological sort of _registry by parameter dependencies graph = { key: { p for p in inspect.signature(method).parameters.keys() if p != "self" and not p.startswith("_") } for key, method in _registry.items() } new_attrs["_graph"] = graph return super().__new__(cls, name, bases, new_attrs) def make_method(method): @wraps(method) async def inner(self, **kwargs): parameters = inspect.signature(method).parameters.keys() # Any parameters not provided by kwargs are resolved from registry to_resolve = [p for p in parameters if p not in kwargs and p != "self"] missing = [p for p in to_resolve if p not in self._registry] assert ( not missing ), "The following DI parameters could not be found in the registry: {}".format( missing ) results = {} results.update(kwargs) results.update(await self.resolve(to_resolve)) return await method(self, **results) return inner bad = [0] class AsyncBase(metaclass=AsyncMeta): async def resolve(self, names): print(" resolve({})".format(names)) results = {} # Resolve them in the correct order ts = TopologicalSorter() ts2 = TopologicalSorter() print(" names = ", names) print(" self._graph = ", self._graph) for name in names: if self._graph[name]: ts.add(name, *self._graph[name]) ts2.add(name, *self._graph[name]) print(" static_order =", tuple(ts2.static_order())) ts.prepare() while ts.is_active(): print(" is_active, i = ", bad[0]) bad[0] += 1 if bad[0] > 20: print(" Infinite loop?") break nodes = ts.get_ready() print(" Do nodes:", nodes) awaitables = [self._registry[name](self, **{ k: v for k, v in results.items() if k in self._graph[name] }) for name in nodes] print(" awaitables: ", awaitables) awaitable_results = await asyncio.gather(*awaitables) results.update({ p[0].__name__: p[1] for p in zip(awaitables, awaitable_results) }) print(results) for node in nodes: ts.done(node) return results ``` Example usage: ```python class Foo(AsyncBase): async def graa(self, boff): print("graa") return 5 async def boff(self): print("boff") return 8 async def other(self, boff, graa): print("other") return 5 + boff + graa foo = Foo() await foo.other() ``` Output: ``` resolve(['boff', 'graa']) names = ['boff', 'graa'] self._graph = {'graa': {'boff'}, 'boff': set(), 'other': {'graa', 'boff'}} static_order = ('boff', 'graa') is_active, i = 0 Do nodes: ('boff',) awaitables: [<coroutine object Foo.boff at 0x10bd81a40>] resolve([]) names = [] self._graph = {'graa': {'boff'}, 'boff': set(), 'other': {'graa', 'boff'}} static_order = () boff {'boff': 8} is_active, i = 1 Do nodes: ('graa',) awaitables: [<coroutine object Foo.graa at 0x10d66b340>] resolve([]) names = [] self._graph = {'graa': {'boff'}, 'boff': set(), 'other': {'graa', 'boff'}} static_order = () graa {'boff': 8, 'graa': 5} other 18 ``` {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 648435885  
https://github.com/simonw/datasette/issues/878#issuecomment-970655304 https://api.github.com/repos/simonw/datasette/issues/878 970655304 IC_kwDOBm6k_c452wZI 9599 2021-11-16T20:32:16Z 2021-11-16T20:32:16Z OWNER This code is really fiddly. I just got to this version: ```python import asyncio from functools import wraps import inspect try: import graphlib except ImportError: from . import vendored_graphlib as graphlib class AsyncMeta(type): def __new__(cls, name, bases, attrs): # Decorate any items that are 'async def' methods _registry = {} new_attrs = {"_registry": _registry} for key, value in attrs.items(): if inspect.iscoroutinefunction(value) and not value.__name__ == "resolve": new_attrs[key] = make_method(value) _registry[key] = new_attrs[key] else: new_attrs[key] = value # Gather graph for later dependency resolution graph = { key: { p for p in inspect.signature(method).parameters.keys() if p != "self" and not p.startswith("_") } for key, method in _registry.items() } new_attrs["_graph"] = graph return super().__new__(cls, name, bases, new_attrs) def make_method(method): @wraps(method) async def inner(self, _results=None, **kwargs): print("inner - _results=", _results) parameters = inspect.signature(method).parameters.keys() # Any parameters not provided by kwargs are resolved from registry to_resolve = [p for p in parameters if p not in kwargs and p != "self"] missing = [p for p in to_resolve if p not in self._registry] assert ( not missing ), "The following DI parameters could not be found in the registry: {}".format( missing ) results = {} results.update(kwargs) if to_resolve: resolved_parameters = await self.resolve(to_resolve, _results) results.update(resolved_parameters) return_value = await method(self, **results) if _results is not None: _results[method.__name__] = return_value return return_value return inner class AsyncBase(metaclass=AsyncMeta): async def resolve(self, names, results=None): print("\n resolve: ", names) if results is None: results = {} # Resolve them in the correct order ts = graphlib.TopologicalSorter() for name in names: ts.add(name, *self._graph[name]) ts.prepare() async def resolve_nodes(nodes): print(" resolve_nodes", nodes) print(" (current results = {})".format(repr(results))) awaitables = [ self._registry[name]( self, _results=results, **{k: v for k, v in results.items() if k in self._graph[name]}, ) for name in nodes if name not in results ] print(" awaitables: ", awaitables) awaitable_results = await asyncio.gather(*awaitables) results.update( {p[0].__name__: p[1] for p in zip(awaitables, awaitable_results)} ) if not ts.is_active(): # Nothing has dependencies - just resolve directly print(" no dependencies, resolve directly") await resolve_nodes(names) else: # Resolve in topological order while ts.is_active(): nodes = ts.get_ready() print(" ts.get_ready() returned nodes:", nodes) await resolve_nodes(nodes) for node in nodes: ts.done(node) print(" End of resolve(), returning", results) return {key: value for key, value in results.items() if key in names} ``` With this test: ```python class Complex(AsyncBase): def __init__(self): self.log = [] async def c(self): print("LOG: c") self.log.append("c") async def b(self, c): print("LOG: b") self.log.append("b") async def a(self, b, c): print("LOG: a") self.log.append("a") async def go(self, a): print("LOG: go") self.log.append("go") return self.log @pytest.mark.asyncio async def test_complex(): result = await Complex().go() # 'c' should only be called once assert result == ["c", "b", "a", "go"] ``` This test sometimes passes, and sometimes fails! Output for a pass: ``` tests/test_asyncdi.py inner - _results= None resolve: ['a'] ts.get_ready() returned nodes: ('c', 'b') resolve_nodes ('c', 'b') (current results = {}) awaitables: [<coroutine object Complex.c at 0x1074ac890>, <coroutine object Complex.b at 0x1074ac820>] inner - _results= {} LOG: c inner - _results= {'c': None} resolve: ['c'] ts.get_ready() returned nodes: ('c',) resolve_nodes ('c',) (current results = {'c': None}) awaitables: [] End of resolve(), returning {'c': None} LOG: b ts.get_ready() returned nodes: ('a',) resolve_nodes ('a',) (current results = {'c': None, 'b': None}) awaitables: [<coroutine object Complex.a at 0x1074ac7b0>] inner - _results= {'c': None, 'b': None} LOG: a End of resolve(), returning {'c': None, 'b': None, 'a': None} LOG: go ``` Output for a fail: ``` tests/test_asyncdi.py inner - _results= None resolve: ['a'] ts.get_ready() returned nodes: ('b', 'c') resolve_nodes ('b', 'c') (current results = {}) awaitables: [<coroutine object Complex.b at 0x10923c890>, <coroutine object Complex.c at 0x10923c820>] inner - _results= {} resolve: ['c'] ts.get_ready() returned nodes: ('c',) resolve_nodes ('c',) (current results = {}) awaitables: [<coroutine object Complex.c at 0x10923c6d0>] inner - _results= {} LOG: c inner - _results= {'c': None} LOG: c End of resolve(), returning {'c': None} LOG: b ts.get_ready() returned nodes: ('a',) resolve_nodes ('a',) (current results = {'c': None, 'b': None}) awaitables: [<coroutine object Complex.a at 0x10923c6d0>] inner - _results= {'c': None, 'b': None} LOG: a End of resolve(), returning {'c': None, 'b': None, 'a': None} LOG: go F =================================================================================================== FAILURES =================================================================================================== _________________________________________________________________________________________________ test_complex _________________________________________________________________________________________________ @pytest.mark.asyncio async def test_complex(): result = await Complex().go() # 'c' should only be called once > assert result == ["c", "b", "a", "go"] E AssertionError: assert ['c', 'c', 'b', 'a', 'go'] == ['c', 'b', 'a', 'go'] E At index 1 diff: 'c' != 'b' E Left contains one more item: 'go' E Use -v to get the full diff tests/test_asyncdi.py:48: AssertionError ================== short test summary info ================================ FAILED tests/test_asyncdi.py::test_complex - AssertionError: assert ['c', 'c', 'b', 'a', 'go'] == ['c', 'b', 'a', 'go'] ``` I figured out why this is happening. `a` requires `b` and `c` `b` also requires `c` The code decides to run `b` and `c` in parallel. If `c` completes first, then when `b` runs it gets to use the already-calculated result for `c` - so it doesn't need to call `c` again. If `b` gets to that point before `c` does it also needs to call `c`. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 648435885  
https://github.com/simonw/datasette/issues/878#issuecomment-970655927 https://api.github.com/repos/simonw/datasette/issues/878 970655927 IC_kwDOBm6k_c452wi3 9599 2021-11-16T20:33:11Z 2021-11-16T20:33:11Z OWNER What should be happening here instead is it should resolve the full graph and notice that `c` is depended on by both `b` and `a` - so it should run `c` first, then run the next ones in parallel. So maybe the algorithm I'm inheriting from https://docs.python.org/3/library/graphlib.html isn't the correct algorithm? {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 648435885  
https://github.com/simonw/datasette/issues/878#issuecomment-970657874 https://api.github.com/repos/simonw/datasette/issues/878 970657874 IC_kwDOBm6k_c452xBS 9599 2021-11-16T20:36:01Z 2021-11-16T20:36:01Z OWNER My goal here is to calculate the most efficient way to resolve the different nodes, running them in parallel where possible. So for this class: ```python class Complex(AsyncBase): async def d(self): pass async def c(self): pass async def b(self, c, d): pass async def a(self, b, c): pass async def go(self, a): pass ``` A call to `go()` should do this: - `c` and `d` in parallel - `b` - `a` - `go` {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 648435885  
https://github.com/simonw/datasette/issues/878#issuecomment-970660299 https://api.github.com/repos/simonw/datasette/issues/878 970660299 IC_kwDOBm6k_c452xnL 9599 2021-11-16T20:39:43Z 2021-11-16T20:42:27Z OWNER But that does seem to be the plan that `TopographicalSorter` provides: ```python graph = {"go": {"a"}, "a": {"b", "c"}, "b": {"c", "d"}} ts = TopologicalSorter(graph) ts.prepare() while ts.is_active(): nodes = ts.get_ready() print(nodes) ts.done(*nodes) ``` Outputs: ``` ('c', 'd') ('b',) ('a',) ('go',) ``` Also: ```python graph = {"go": {"d", "e", "f"}, "d": {"b", "c"}, "b": {"c"}} ts = TopologicalSorter(graph) ts.prepare() while ts.is_active(): nodes = ts.get_ready() print(nodes) ts.done(*nodes) ``` Gives: ``` ('e', 'f', 'c') ('b',) ('d',) ('go',) ``` I'm confident that `TopologicalSorter` is the way to do this. I think I need to rewrite my code to call it once to get that plan, then `await asyncio.gather(*nodes)` in turn to execute it. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 648435885  
https://github.com/simonw/datasette/issues/878#issuecomment-970673085 https://api.github.com/repos/simonw/datasette/issues/878 970673085 IC_kwDOBm6k_c4520u9 9599 2021-11-16T20:58:24Z 2021-11-16T20:58:24Z OWNER New test: ```python class Complex(AsyncBase): def __init__(self): self.log = [] async def d(self): await asyncio.sleep(random() * 0.1) print("LOG: d") self.log.append("d") async def c(self): await asyncio.sleep(random() * 0.1) print("LOG: c") self.log.append("c") async def b(self, c, d): print("LOG: b") self.log.append("b") async def a(self, b, c): print("LOG: a") self.log.append("a") async def go(self, a): print("LOG: go") self.log.append("go") return self.log @pytest.mark.asyncio async def test_complex(): result = await Complex().go() # 'c' should only be called once assert tuple(result) in ( # c and d could happen in either order ("c", "d", "b", "a", "go"), ("d", "c", "b", "a", "go"), ) ``` And this code passes it: ```python import asyncio from functools import wraps import inspect try: import graphlib except ImportError: from . import vendored_graphlib as graphlib class AsyncMeta(type): def __new__(cls, name, bases, attrs): # Decorate any items that are 'async def' methods _registry = {} new_attrs = {"_registry": _registry} for key, value in attrs.items(): if inspect.iscoroutinefunction(value) and not value.__name__ == "resolve": new_attrs[key] = make_method(value) _registry[key] = new_attrs[key] else: new_attrs[key] = value # Gather graph for later dependency resolution graph = { key: { p for p in inspect.signature(method).parameters.keys() if p != "self" and not p.startswith("_") } for key, method in _registry.items() } new_attrs["_graph"] = graph return super().__new__(cls, name, bases, new_attrs) def make_method(method): parameters = inspect.signature(method).parameters.keys() @wraps(method) async def inner(self, _results=None, **kwargs): print("\n{}.{}({}) _results={}".format(self, method.__name__, kwargs, _results)) # Any parameters not provided by kwargs are resolved from registry to_resolve = [p for p in parameters if p not in kwargs and p != "self"] missing = [p for p in to_resolve if p not in self._registry] assert ( not missing ), "The following DI parameters could not be found in the registry: {}".format( missing ) results = {} results.update(kwargs) if to_resolve: resolved_parameters = await self.resolve(to_resolve, _results) results.update(resolved_parameters) return_value = await method(self, **results) if _results is not None: _results[method.__name__] = return_value return return_value return inner class AsyncBase(metaclass=AsyncMeta): async def resolve(self, names, results=None): print("\n resolve: ", names) if results is None: results = {} # Come up with an execution plan, just for these nodes ts = graphlib.TopologicalSorter() to_do = set(names) done = set() while to_do: item = to_do.pop() dependencies = self._graph[item] ts.add(item, *dependencies) done.add(item) # Add any not-done dependencies to the queue to_do.update({k for k in dependencies if k not in done}) ts.prepare() plan = [] while ts.is_active(): node_group = ts.get_ready() plan.append(node_group) ts.done(*node_group) print("plan:", plan) results = {} for node_group in plan: awaitables = [ self._registry[name]( self, _results=results, **{k: v for k, v in results.items() if k in self._graph[name]}, ) for name in node_group ] print(" results = ", results) print(" awaitables: ", awaitables) awaitable_results = await asyncio.gather(*awaitables) results.update( {p[0].__name__: p[1] for p in zip(awaitables, awaitable_results)} ) print(" End of resolve(), returning", results) return {key: value for key, value in results.items() if key in names} ``` {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 648435885  
https://github.com/simonw/datasette/issues/878#issuecomment-970705738 https://api.github.com/repos/simonw/datasette/issues/878 970705738 IC_kwDOBm6k_c4528tK 9599 2021-11-16T21:44:31Z 2021-11-16T21:44:31Z OWNER Wrote a TIL about what I learned using `TopologicalSorter`: https://til.simonwillison.net/python/graphlib-topologicalsorter {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 648435885  
https://github.com/simonw/datasette/issues/878#issuecomment-970712713 https://api.github.com/repos/simonw/datasette/issues/878 970712713 IC_kwDOBm6k_c452-aJ 9599 2021-11-16T21:54:33Z 2021-11-16T21:54:33Z OWNER I'm going to continue working on this in a PR. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 648435885  
https://github.com/simonw/datasette/pull/1512#issuecomment-970718337 https://api.github.com/repos/simonw/datasette/issues/1512 970718337 IC_kwDOBm6k_c452_yB 9599 2021-11-16T22:02:30Z 2021-11-16T22:02:30Z OWNER I've decided to make the clever `asyncio` dependency injection opt-in - so you can either decorate with `@inject` or you can set `inject_all = True` on the class - for example: ```python import asyncio from datasette.utils.asyncdi import AsyncBase, inject class Simple(AsyncBase): def __init__(self): self.log = [] @inject async def two(self): self.log.append("two") @inject async def one(self, two): self.log.append("one") return self.log async def not_inject(self, one, two): return one + two class Complex(AsyncBase): inject_all = True def __init__(self): self.log = [] async def b(self): self.log.append("b") async def a(self, b): self.log.append("a") async def go(self, a): self.log.append("go") return self.log ``` {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1055402144  
https://github.com/simonw/datasette/pull/1512#issuecomment-970718652 https://api.github.com/repos/simonw/datasette/issues/1512 970718652 IC_kwDOBm6k_c452_28 22429695 2021-11-16T22:02:59Z 2021-11-16T23:51:48Z NONE # [Codecov](https://codecov.io/gh/simonw/datasette/pull/1512?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) Report > Merging [#1512](https://codecov.io/gh/simonw/datasette/pull/1512?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) (8f757da) into [main](https://codecov.io/gh/simonw/datasette/commit/0156c6b5e52d541e93f0d68e9245f20ae83bc933?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) (0156c6b) will **decrease** coverage by `2.10%`. > The diff coverage is `36.20%`. [![Impacted file tree graph](https://codecov.io/gh/simonw/datasette/pull/1512/graphs/tree.svg?width=650&height=150&src=pr&token=eSahVY7kw1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison)](https://codecov.io/gh/simonw/datasette/pull/1512?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) ```diff @@ Coverage Diff @@ ## main #1512 +/- ## ========================================== - Coverage 91.82% 89.72% -2.11% ========================================== Files 34 36 +2 Lines 4430 4604 +174 ========================================== + Hits 4068 4131 +63 - Misses 362 473 +111 ``` | [Impacted Files](https://codecov.io/gh/simonw/datasette/pull/1512?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) | Coverage Δ | | |---|---|---| | [datasette/utils/vendored\_graphlib.py](https://codecov.io/gh/simonw/datasette/pull/1512/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3V0aWxzL3ZlbmRvcmVkX2dyYXBobGliLnB5) | `0.00% <0.00%> (ø)` | | | [datasette/utils/asyncdi.py](https://codecov.io/gh/simonw/datasette/pull/1512/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3V0aWxzL2FzeW5jZGkucHk=) | `96.92% <96.92%> (ø)` | | ------ [Continue to review full report at Codecov](https://codecov.io/gh/simonw/datasette/pull/1512?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/simonw/datasette/pull/1512?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). Last update [0156c6b...8f757da](https://codecov.io/gh/simonw/datasette/pull/1512?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1055402144  
https://github.com/simonw/datasette/issues/1513#issuecomment-970738130 https://api.github.com/repos/simonw/datasette/issues/1513 970738130 IC_kwDOBm6k_c453EnS 9599 2021-11-16T22:32:19Z 2021-11-16T22:32:19Z OWNER I came up with the following query which seems to work! ```sql with cte as ( select rowid, country, country_long, name, owner, primary_fuel from [global-power-plants] ), truncated as ( select null as _facet, null as facet_name, null as facet_count, rowid, country, country_long, name, owner, primary_fuel from cte order by rowid limit 4 ), country_long_facet as ( select 'country_long' as _facet, country_long as facet_name, count(*) as facet_count, null, null, null, null, null, null from cte group by facet_name order by facet_count desc limit 3 ), owner_facet as ( select 'owner' as _facet, owner as facet_name, count(*) as facet_count, null, null, null, null, null, null from cte group by facet_name order by facet_count desc limit 3 ), primary_fuel_facet as ( select 'primary_fuel' as _facet, primary_fuel as facet_name, count(*) as facet_count, null, null, null, null, null, null from cte group by facet_name order by facet_count desc limit 3 ) select * from truncated union all select * from country_long_facet union all select * from owner_facet union all select * from primary_fuel_facet ``` (Limits should be 101, 31, 31, 31 but I reduced size to get a shorter example table). Results [look like this](https://global-power-plants.datasettes.com/global-power-plants?sql=with+cte+as+%28%0D%0A++select+rowid%2C+country%2C+country_long%2C+name%2C+owner%2C+primary_fuel%0D%0A++from+%5Bglobal-power-plants%5D%0D%0A%29%2C%0D%0Atruncated+as+%28%0D%0A++select+null+as+_facet%2C+null+as+facet_name%2C+null+as+facet_count%2C+rowid%2C+country%2C+country_long%2C+name%2C+owner%2C+primary_fuel%0D%0A++from+cte+order+by+rowid+limit+4%0D%0A%29%2C%0D%0Acountry_long_facet+as+%28%0D%0A++select+%27country_long%27+as+_facet%2C+country_long+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%2C%0D%0Aowner_facet+as+%28%0D%0A++select+%27owner%27+as+_facet%2C+owner+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%2C%0D%0Aprimary_fuel_facet+as+%28%0D%0A++select+%27primary_fuel%27+as+_facet%2C+primary_fuel+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%0D%0Aselect+*+from+truncated%0D%0Aunion+all+select+*+from+country_long_facet%0D%0Aunion+all+select+*+from+owner_facet%0D%0Aunion+all+select+*+from+primary_fuel_facet): _facet | facet_name | facet_count | rowid | country | country_long | name | owner | primary_fuel -- | -- | -- | -- | -- | -- | -- | -- | --   |   |   | 1 | AFG | Afghanistan | Kajaki Hydroelectric Power Plant Afghanistan |   | Hydro   |   |   | 2 | AFG | Afghanistan | Kandahar DOG |   | Solar   |   |   | 3 | AFG | Afghanistan | Kandahar JOL |   | Solar   |   |   | 4 | AFG | Afghanistan | Mahipar Hydroelectric Power Plant Afghanistan |   | Hydro country_long | United States of America | 8688 |   |   |   |   |   |   country_long | China | 4235 |   |   |   |   |   |   country_long | United Kingdom | 2603 |   |   |   |   |   |   owner |   | 14112 |   |   |   |   |   |   owner | Lightsource Renewable Energy | 120 |   |   |   |   |   |   owner | Cypress Creek Renewables | 109 |   |   |   |   |   |   primary_fuel | Solar | 9662 |   |   |   |   |   |   primary_fuel | Hydro | 7155 |   |   |   |   |   |   primary_fuel | Wind | 5188 |   |   |   |   |   |   This is a neat proof of concept. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1055469073  
https://github.com/simonw/datasette/issues/1513#issuecomment-970742415 https://api.github.com/repos/simonw/datasette/issues/1513 970742415 IC_kwDOBm6k_c453FqP 9599 2021-11-16T22:37:14Z 2021-11-16T22:37:14Z OWNER The query takes 42.794ms to run. Here's the equivalent page using separate queries: https://global-power-plants.datasettes.com/global-power-plants/global-power-plants?_facet_size=3&_size=2&_nocount=1 Annoyingly I can't disable facet suggestions but keep facets. I'm going to turn on tracing so I can see how long the separate queries took. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1055469073  
https://github.com/simonw/datasette/issues/1513#issuecomment-970758179 https://api.github.com/repos/simonw/datasette/issues/1513 970758179 IC_kwDOBm6k_c453Jgj 9599 2021-11-16T22:47:38Z 2021-11-16T22:47:38Z OWNER Trace now enabled: https://global-power-plants.datasettes.com/global-power-plants/global-power-plants?_facet_size=3&_size=2&_nocount=1&_trace=1 Here are the relevant traces: ```json [ { "type": "sql", "start": 31.214430154, "end": 31.214817089, "duration_ms": 0.3869350000016425, "traceback": [ " File \"/usr/local/lib/python3.8/site-packages/datasette/views/base.py\", line 262, in get\n return await self.view_get(\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/views/base.py\", line 477, in view_get\n response_or_template_contexts = await self.data(\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/views/table.py\", line 705, in data\n results = await db.execute(sql, params, truncate=True, **extra_args)\n" ], "database": "global-power-plants", "sql": "select rowid, country, country_long, name, gppd_idnr, capacity_mw, latitude, longitude, primary_fuel, other_fuel1, other_fuel2, other_fuel3, commissioning_year, owner, source, url, geolocation_source, wepp_id, year_of_capacity_data, generation_gwh_2013, generation_gwh_2014, generation_gwh_2015, generation_gwh_2016, generation_gwh_2017, generation_data_source, estimated_generation_gwh from [global-power-plants] order by rowid limit 3", "params": {} }, { "type": "sql", "start": 31.215234586, "end": 31.220110342, "duration_ms": 4.875756000000564, "traceback": [ " File \"/usr/local/lib/python3.8/site-packages/datasette/views/table.py\", line 760, in data\n ) = await facet.facet_results()\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/facets.py\", line 212, in facet_results\n facet_rows_results = await self.ds.execute(\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/app.py\", line 634, in execute\n return await self.databases[db_name].execute(\n" ], "database": "global-power-plants", "sql": "select country_long as value, count(*) as count from (\n select rowid, country, country_long, name, gppd_idnr, capacity_mw, latitude, longitude, primary_fuel, other_fuel1, other_fuel2, other_fuel3, commissioning_year, owner, source, url, geolocation_source, wepp_id, year_of_capacity_data, generation_gwh_2013, generation_gwh_2014, generation_gwh_2015, generation_gwh_2016, generation_gwh_2017, generation_data_source, estimated_generation_gwh from [global-power-plants] \n )\n where country_long is not null\n group by country_long order by count desc, value limit 4", "params": [] }, { "type": "sql", "start": 31.221062485, "end": 31.228968364, "duration_ms": 7.905878999999061, "traceback": [ " File \"/usr/local/lib/python3.8/site-packages/datasette/views/table.py\", line 760, in data\n ) = await facet.facet_results()\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/facets.py\", line 212, in facet_results\n facet_rows_results = await self.ds.execute(\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/app.py\", line 634, in execute\n return await self.databases[db_name].execute(\n" ], "database": "global-power-plants", "sql": "select owner as value, count(*) as count from (\n select rowid, country, country_long, name, gppd_idnr, capacity_mw, latitude, longitude, primary_fuel, other_fuel1, other_fuel2, other_fuel3, commissioning_year, owner, source, url, geolocation_source, wepp_id, year_of_capacity_data, generation_gwh_2013, generation_gwh_2014, generation_gwh_2015, generation_gwh_2016, generation_gwh_2017, generation_data_source, estimated_generation_gwh from [global-power-plants] \n )\n where owner is not null\n group by owner order by count desc, value limit 4", "params": [] }, { "type": "sql", "start": 31.229809757, "end": 31.253902162, "duration_ms": 24.09240499999754, "traceback": [ " File \"/usr/local/lib/python3.8/site-packages/datasette/views/table.py\", line 760, in data\n ) = await facet.facet_results()\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/facets.py\", line 212, in facet_results\n facet_rows_results = await self.ds.execute(\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/app.py\", line 634, in execute\n return await self.databases[db_name].execute(\n" ], "database": "global-power-plants", "sql": "select primary_fuel as value, count(*) as count from (\n select rowid, country, country_long, name, gppd_idnr, capacity_mw, latitude, longitude, primary_fuel, other_fuel1, other_fuel2, other_fuel3, commissioning_year, owner, source, url, geolocation_source, wepp_id, year_of_capacity_data, generation_gwh_2013, generation_gwh_2014, generation_gwh_2015, generation_gwh_2016, generation_gwh_2017, generation_data_source, estimated_generation_gwh from [global-power-plants] \n )\n where primary_fuel is not null\n group by primary_fuel order by count desc, value limit 4", "params": [] }, { "type": "sql", "start": 31.255699745, "end": 31.256243889, "duration_ms": 0.544143999999136, "traceback": [ " File \"/usr/local/lib/python3.8/site-packages/datasette/facets.py\", line 145, in suggest\n row_count = await self.get_row_count()\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/facets.py\", line 132, in get_row_count\n await self.ds.execute(\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/app.py\", line 634, in execute\n return await self.databases[db_name].execute(\n" ], "database": "global-power-plants", "sql": "select count(*) from (select rowid, country, country_long, name, gppd_idnr, capacity_mw, latitude, longitude, primary_fuel, other_fuel1, other_fuel2, other_fuel3, commissioning_year, owner, source, url, geolocation_source, wepp_id, year_of_capacity_data, generation_gwh_2013, generation_gwh_2014, generation_gwh_2015, generation_gwh_2016, generation_gwh_2017, generation_data_source, estimated_generation_gwh from [global-power-plants] )", "params": [] } ] ``` ``` fetch rows: 0.3869350000016425 ms facet country_long: 4.875756000000564 ms facet owner: 7.905878999999061 ms facet primary_fuel: 24.09240499999754 ms count: 0.544143999999136 ms ``` Total = 37.8ms I modified the query to include the total count as well: https://global-power-plants.datasettes.com/global-power-plants?sql=with+cte+as+%28%0D%0A++select+rowid%2C+country%2C+country_long%2C+name%2C+owner%2C+primary_fuel%0D%0A++from+%5Bglobal-power-plants%5D%0D%0A%29%2C%0D%0Atruncated+as+%28%0D%0A++select+null+as+_facet%2C+null+as+facet_name%2C+null+as+facet_count%2C+rowid%2C+country%2C+country_long%2C+name%2C+owner%2C+primary_fuel%0D%0A++from+cte+order+by+rowid+limit+4%0D%0A%29%2C%0D%0Acountry_long_facet+as+%28%0D%0A++select+%27country_long%27+as+_facet%2C+country_long+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%2C%0D%0Aowner_facet+as+%28%0D%0A++select+%27owner%27+as+_facet%2C+owner+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%2C%0D%0Aprimary_fuel_facet+as+%28%0D%0A++select+%27primary_fuel%27+as+_facet%2C+primary_fuel+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%2C%0D%0Atotal_count+as+%28%0D%0A++select+%27COUNT%27+as+_facet%2C+%27%27+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte%0D%0A%29%0D%0Aselect+*+from+truncated%0D%0Aunion+all+select+*+from+country_long_facet%0D%0Aunion+all+select+*+from+owner_facet%0D%0Aunion+all+select+*+from+primary_fuel_facet%0D%0Aunion+all+select+*+from+total_count&_trace=1 ```sql with cte as ( select rowid, country, country_long, name, owner, primary_fuel from [global-power-plants] ), truncated as ( select null as _facet, null as facet_name, null as facet_count, rowid, country, country_long, name, owner, primary_fuel from cte order by rowid limit 4 ), country_long_facet as ( select 'country_long' as _facet, country_long as facet_name, count(*) as facet_count, null, null, null, null, null, null from cte group by facet_name order by facet_count desc limit 3 ), owner_facet as ( select 'owner' as _facet, owner as facet_name, count(*) as facet_count, null, null, null, null, null, null from cte group by facet_name order by facet_count desc limit 3 ), primary_fuel_facet as ( select 'primary_fuel' as _facet, primary_fuel as facet_name, count(*) as facet_count, null, null, null, null, null, null from cte group by facet_name order by facet_count desc limit 3 ), total_count as ( select 'COUNT' as _facet, '' as facet_name, count(*) as facet_count, null, null, null, null, null, null from cte ) select * from truncated union all select * from country_long_facet union all select * from owner_facet union all select * from primary_fuel_facet union all select * from total_count ``` The trace says that query took 34.801436999998714 ms. To my huge surprise, this convoluted optimization only shaves the sum query time down from 37.8ms to 34.8ms! That entire database file is just 11.1 MB though. Maybe it would make a meaningful difference on something larger? {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1055469073  
https://github.com/simonw/datasette/issues/1513#issuecomment-970766486 https://api.github.com/repos/simonw/datasette/issues/1513 970766486 IC_kwDOBm6k_c453LiW 9599 2021-11-16T22:52:56Z 2021-11-16T22:56:07Z OWNER https://covid-19.datasettes.com/covid is 805.2MB https://covid-19.datasettes.com/covid/ny_times_us_counties?_trace=1&_facet_size=3&_size=2 Equivalent SQL: https://covid-19.datasettes.com/covid?sql=with+cte+as+%28%0D%0A++select+rowid%2C+date%2C+county%2C+state%2C+fips%2C+cases%2C+deaths%0D%0A++from+ny_times_us_counties%0D%0A%29%2C%0D%0Atruncated+as+%28%0D%0A++select+null+as+_facet%2C+null+as+facet_name%2C+null+as+facet_count%2C+rowid%2C+date%2C+county%2C+state%2C+fips%2C+cases%2C+deaths%0D%0A++from+cte+order+by+date+desc+limit+4%0D%0A%29%2C%0D%0Astate_facet+as+%28%0D%0A++select+%27state%27+as+_facet%2C+state+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%2C%0D%0Afips_facet+as+%28%0D%0A++select+%27fips%27+as+_facet%2C+fips+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%2C%0D%0Acounty_facet+as+%28%0D%0A++select+%27county%27+as+_facet%2C+county+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%2C%0D%0Atotal_count+as+%28%0D%0A++select+%27COUNT%27+as+_facet%2C+%27%27+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte%0D%0A%29%0D%0Aselect+*+from+truncated%0D%0Aunion+all+select+*+from+state_facet%0D%0Aunion+all+select+*+from+fips_facet%0D%0Aunion+all+select+*+from+county_facet%0D%0Aunion+all+select+*+from+total_count ```sql with cte as ( select rowid, date, county, state, fips, cases, deaths from ny_times_us_counties ), truncated as ( select null as _facet, null as facet_name, null as facet_count, rowid, date, county, state, fips, cases, deaths from cte order by date desc limit 4 ), state_facet as ( select 'state' as _facet, state as facet_name, count(*) as facet_count, null, null, null, null, null, null, null from cte group by facet_name order by facet_count desc limit 3 ), fips_facet as ( select 'fips' as _facet, fips as facet_name, count(*) as facet_count, null, null, null, null, null, null, null from cte group by facet_name order by facet_count desc limit 3 ), county_facet as ( select 'county' as _facet, county as facet_name, count(*) as facet_count, null, null, null, null, null, null, null from cte group by facet_name order by facet_count desc limit 3 ), total_count as ( select 'COUNT' as _facet, '' as facet_name, count(*) as facet_count, null, null, null, null, null, null, null from cte ) select * from truncated union all select * from state_facet union all select * from fips_facet union all select * from county_facet union all select * from total_count ``` _facet | facet_name | facet_count | rowid | date | county | state | fips | cases | deaths -- | -- | -- | -- | -- | -- | -- | -- | -- | --   |   |   | 1917344 | 2021-11-15 | Autauga | Alabama | 1001 | 10407 | 154   |   |   | 1917345 | 2021-11-15 | Baldwin | Alabama | 1003 | 37875 | 581   |   |   | 1917346 | 2021-11-15 | Barbour | Alabama | 1005 | 3648 | 79   |   |   | 1917347 | 2021-11-15 | Bibb | Alabama | 1007 | 4317 | 92 state | Texas | 148028 |   |   |   |   |   |   |   state | Georgia | 96249 |   |   |   |   |   |   |   state | Virginia | 79315 |   |   |   |   |   |   |   fips |   | 17580 |   |   |   |   |   |   |   fips | 53061 | 665 |   |   |   |   |   |   |   fips | 17031 | 662 |   |   |   |   |   |   |   county | Washington | 18666 |   |   |   |   |   |   |   county | Unknown | 15840 |   |   |   |   |   |   |   county | Jefferson | 15637 |   |   |   |   |   |   |   COUNT |   | 1920593 |   |   |   |   |   |   |   {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1055469073  
https://github.com/simonw/datasette/issues/1513#issuecomment-970767952 https://api.github.com/repos/simonw/datasette/issues/1513 970767952 IC_kwDOBm6k_c453L5Q 9599 2021-11-16T22:53:52Z 2021-11-16T22:53:52Z OWNER It's going to take another 15 minutes for the build to finish and deploy the version with `_trace=1`: https://github.com/simonw/covid-19-datasette/actions/runs/1469150112 {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1055469073  
https://github.com/simonw/datasette/issues/1513#issuecomment-970770304 https://api.github.com/repos/simonw/datasette/issues/1513 970770304 IC_kwDOBm6k_c453MeA 9599 2021-11-16T22:55:19Z 2021-11-16T22:55:19Z OWNER (One thing I really like about this pattern is that it should work exactly the same when used to facet the results of arbitrary SQL queries as it does when faceting results from the table page.) {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1055469073  
https://github.com/simonw/datasette/issues/1513#issuecomment-970780866 https://api.github.com/repos/simonw/datasette/issues/1513 970780866 IC_kwDOBm6k_c453PDC 9599 2021-11-16T23:01:57Z 2021-11-16T23:01:57Z OWNER One disadvantage to this approach: if you have a SQL time limit of 1s and it takes 0.9s to return the rows but then 0.5s to calculate each of the requested facets the entire query will exceed the time limit. Could work around this by catching that error and then re-running the query just for the rows, but that would result in the user having to wait longer for the results. Could try to remember if that has happened using an in-memory Python data structure and skip the faceting optimization if it's caused problems in the past? That seems a bit gross. Maybe this becomes an opt-in optimization you can request in your `metadata.json` setting for that table, which massively increases the time limit? That's a bit weird too - now there are two separate implementations of the faceting logic, which had better have a REALLY big pay-off to be worth maintaining. What if we kept the query that returns the rows to be displayed on the page separate from the facets, but then executed all of the facets together using this method such that the `cte` only (presumably) has to be calculated once? That would still lead to multiple facets potentially exceeding the SQL time limit when single facets would not have. Maybe a better optimization would be to move facets to happening via `fetch()` calls from the client, so the user gets to see their rows instantly and the facets then appear as and when they are available (though it would cause page jank). {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1055469073  
https://github.com/simonw/datasette/issues/1513#issuecomment-970827674 https://api.github.com/repos/simonw/datasette/issues/1513 970827674 IC_kwDOBm6k_c453aea 9599 2021-11-16T23:26:58Z 2021-11-16T23:26:58Z OWNER With trace. https://covid-19.datasettes.com/covid/ny_times_us_counties?_trace=1&_facet_size=3&_size=2&_trace=1 shows the following: ``` fetch rows: 0.41762600005768036 ms facet state: 284.30423800000426 ms facet county: 273.2565999999679 ms facet fips: 197.80996999998024 ms ``` = 755.78843400001ms total It didn't run a count because that's the homepage and the count is cached. So I dropped the count from the query and ran it: https://covid-19.datasettes.com/covid?sql=with+cte+as+(%0D%0A++select+rowid%2C+date%2C+county%2C+state%2C+fips%2C+cases%2C+deaths%0D%0A++from+ny_times_us_counties%0D%0A)%2C%0D%0Atruncated+as+(%0D%0A++select+null+as+_facet%2C+null+as+facet_name%2C+null+as+facet_count%2C+rowid%2C+date%2C+county%2C+state%2C+fips%2C+cases%2C+deaths%0D%0A++from+cte+order+by+date+desc+limit+4%0D%0A)%2C%0D%0Astate_facet+as+(%0D%0A++select+%27state%27+as+_facet%2C+state+as+facet_name%2C+count(*)+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A)%2C%0D%0Afips_facet+as+(%0D%0A++select+%27fips%27+as+_facet%2C+fips+as+facet_name%2C+count(*)+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A)%2C%0D%0Acounty_facet+as+(%0D%0A++select+%27county%27+as+_facet%2C+county+as+facet_name%2C+count(*)+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A)%0D%0Aselect+*+from+truncated%0D%0Aunion+all+select+*+from+state_facet%0D%0Aunion+all+select+*+from+fips_facet%0D%0Aunion+all+select+*+from+county_facet&_trace=1 Shows 649.4359889999259 ms for the query - compared to 755.78843400001ms for the separate. So it saved about 100ms. Still not a huge difference though! {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1055469073  
https://github.com/simonw/datasette/issues/1513#issuecomment-970828568 https://api.github.com/repos/simonw/datasette/issues/1513 970828568 IC_kwDOBm6k_c453asY 9599 2021-11-16T23:27:11Z 2021-11-16T23:27:11Z OWNER One last experiment: I'm going to try running an expensive query in the CTE portion. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1055469073  
https://github.com/simonw/datasette/issues/1513#issuecomment-970845844 https://api.github.com/repos/simonw/datasette/issues/1513 970845844 IC_kwDOBm6k_c453e6U 9599 2021-11-16T23:35:38Z 2021-11-16T23:35:38Z OWNER I tried adding `cases > 10000` but the SQL query now takes too long - so moving this to my laptop. ``` cd /tmp wget https://covid-19.datasettes.com/covid.db datasette covid.db \ --setting facet_time_limit_ms 10000 \ --setting sql_time_limit_ms 10000 \ --setting trace_debug 1 ``` `http://127.0.0.1:8006/covid/ny_times_us_counties?_trace=1&_facet_size=3&_size=2&cases__gt=10000` shows in the traces: ```json [ { "type": "sql", "start": 12.693033525, "end": 12.694056904, "duration_ms": 1.0233789999993803, "traceback": [ " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/base.py\", line 262, in get\n return await self.view_get(\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/base.py\", line 477, in view_get\n response_or_template_contexts = await self.data(\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/table.py\", line 705, in data\n results = await db.execute(sql, params, truncate=True, **extra_args)\n" ], "database": "covid", "sql": "select rowid, date, county, state, fips, cases, deaths from ny_times_us_counties where \"cases\" > :p0 order by rowid limit 3", "params": { "p0": 10000 } }, { "type": "sql", "start": 12.694285093, "end": 12.814936275, "duration_ms": 120.65118200000136, "traceback": [ " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/base.py\", line 262, in get\n return await self.view_get(\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/base.py\", line 477, in view_get\n response_or_template_contexts = await self.data(\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/table.py\", line 723, in data\n count_rows = list(await db.execute(count_sql, from_sql_params))\n" ], "database": "covid", "sql": "select count(*) from ny_times_us_counties where \"cases\" > :p0", "params": { "p0": 10000 } }, { "type": "sql", "start": 12.818812089, "end": 12.851172544, "duration_ms": 32.360455000000954, "traceback": [ " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/table.py\", line 856, in data\n suggested_facets.extend(await facet.suggest())\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/facets.py\", line 164, in suggest\n distinct_values = await self.ds.execute(\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/app.py\", line 634, in execute\n return await self.databases[db_name].execute(\n" ], "database": "covid", "sql": "select county, count(*) as n from (\n select rowid, date, county, state, fips, cases, deaths from ny_times_us_counties where \"cases\" > :p0 \n ) where county is not null\n group by county\n limit 4", "params": { "p0": 10000 } }, { "type": "sql", "start": 12.851418868, "end": 12.871268359, "duration_ms": 19.84949100000044, "traceback": [ " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/table.py\", line 856, in data\n suggested_facets.extend(await facet.suggest())\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/facets.py\", line 164, in suggest\n distinct_values = await self.ds.execute(\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/app.py\", line 634, in execute\n return await self.databases[db_name].execute(\n" ], "database": "covid", "sql": "select state, count(*) as n from (\n select rowid, date, county, state, fips, cases, deaths from ny_times_us_counties where \"cases\" > :p0 \n ) where state is not null\n group by state\n limit 4", "params": { "p0": 10000 } }, { "type": "sql", "start": 12.871497655, "end": 12.897715027, "duration_ms": 26.217371999999628, "traceback": [ " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/table.py\", line 856, in data\n suggested_facets.extend(await facet.suggest())\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/facets.py\", line 164, in suggest\n distinct_values = await self.ds.execute(\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/app.py\", line 634, in execute\n return await self.databases[db_name].execute(\n" ], "database": "covid", "sql": "select fips, count(*) as n from (\n select rowid, date, county, state, fips, cases, deaths from ny_times_us_counties where \"cases\" > :p0 \n ) where fips is not null\n group by fips\n limit 4", "params": { "p0": 10000 } } ] ``` So that's: ``` fetch rows: 1.0233789999993803 ms count: 120.65118200000136 ms facet county: 32.360455000000954 ms facet state: 19.84949100000044 ms facet fips: 26.217371999999628 ms ``` = 200.1 ms total Compared to: `http://127.0.0.1:8006/covid?sql=with+cte+as+(%0D%0A++select+rowid%2C+date%2C+county%2C+state%2C+fips%2C+cases%2C+deaths%0D%0A++from+ny_times_us_counties%0D%0A)%2C%0D%0Atruncated+as+(%0D%0A++select+null+as+_facet%2C+null+as+facet_name%2C+null+as+facet_count%2C+rowid%2C+date%2C+county%2C+state%2C+fips%2C+cases%2C+deaths%0D%0A++from+cte+order+by+date+desc+limit+4%0D%0A)%2C%0D%0Astate_facet+as+(%0D%0A++select+%27state%27+as+_facet%2C+state+as+facet_name%2C+count(*)+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A)%2C%0D%0Afips_facet+as+(%0D%0A++select+%27fips%27+as+_facet%2C+fips+as+facet_name%2C+count(*)+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A)%2C%0D%0Acounty_facet+as+(%0D%0A++select+%27county%27+as+_facet%2C+county+as+facet_name%2C+count(*)+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A)%0D%0Aselect+*+from+truncated%0D%0Aunion+all+select+*+from+state_facet%0D%0Aunion+all+select+*+from+fips_facet%0D%0Aunion+all+select+*+from+county_facet&_trace=1` Which is 353ms total. The separate queries ran faster! Really surprising result there. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1055469073  
https://github.com/simonw/datasette/issues/1513#issuecomment-970853917 https://api.github.com/repos/simonw/datasette/issues/1513 970853917 IC_kwDOBm6k_c453g4d 9599 2021-11-16T23:41:01Z 2021-11-16T23:41:01Z OWNER One very interesting difference between the two: on the single giant query page: ```json { "request_duration_ms": 376.4317020000476, "sum_trace_duration_ms": 370.0828700000329, "num_traces": 5 } ``` And on the page that uses separate queries: ```json { "request_duration_ms": 819.012272000009, "sum_trace_duration_ms": 201.52852100000018, "num_traces": 19 } ``` The separate pages page takes 819ms total to render the page, but spends 201ms across 19 SQL queries. The single big query takes 376ms total to render the page, spending 370ms in 5 queries <details><summary>Those 5 queries, if you're interested</summary> ```sql select database_name, schema_version from databases PRAGMA schema_version PRAGMA schema_version explain with cte as (\r\n select rowid, date, county, state, fips, cases, deaths\r\n from ny_times_us_counties\r\n),\r\ntruncated as (\r\n select null as _facet, null as facet_name, null as facet_count, rowid, date, county, state, fips, cases, deaths\r\n from cte order by date desc limit 4\r\n),\r\nstate_facet as (\r\n select 'state' as _facet, state as facet_name, count(*) as facet_count,\r\n null, null, null, null, null, null, null\r\n from cte group by facet_name order by facet_count desc limit 3\r\n),\r\nfips_facet as (\r\n select 'fips' as _facet, fips as facet_name, count(*) as facet_count,\r\n null, null, null, null, null, null, null\r\n from cte group by facet_name order by facet_count desc limit 3\r\n),\r\ncounty_facet as (\r\n select 'county' as _facet, county as facet_name, count(*) as facet_count,\r\n null, null, null, null, null, null, null\r\n from cte group by facet_name order by facet_count desc limit 3\r\n)\r\nselect * from truncated\r\nunion all select * from state_facet\r\nunion all select * from fips_facet\r\nunion all select * from county_facet with cte as (\r\n select rowid, date, county, state, fips, cases, deaths\r\n from ny_times_us_counties\r\n),\r\ntruncated as (\r\n select null as _facet, null as facet_name, null as facet_count, rowid, date, county, state, fips, cases, deaths\r\n from cte order by date desc limit 4\r\n),\r\nstate_facet as (\r\n select 'state' as _facet, state as facet_name, count(*) as facet_count,\r\n null, null, null, null, null, null, null\r\n from cte group by facet_name order by facet_count desc limit 3\r\n),\r\nfips_facet as (\r\n select 'fips' as _facet, fips as facet_name, count(*) as facet_count,\r\n null, null, null, null, null, null, null\r\n from cte group by facet_name order by facet_count desc limit 3\r\n),\r\ncounty_facet as (\r\n select 'county' as _facet, county as facet_name, count(*) as facet_count,\r\n null, null, null, null, null, null, null\r\n from cte group by facet_name order by facet_count desc limit 3\r\n)\r\nselect * from truncated\r\nunion all select * from state_facet\r\nunion all select * from fips_facet\r\nunion all select * from county_facet ``` </details> All of that additional non-SQL overhead must be stuff relating to Python and template rendering code running on the page. I'm really surprised at how much overhead that is! This is worth researching separately. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1055469073  
https://github.com/simonw/datasette/issues/1513#issuecomment-970855084 https://api.github.com/repos/simonw/datasette/issues/1513 970855084 IC_kwDOBm6k_c453hKs 9599 2021-11-16T23:41:46Z 2021-11-16T23:41:46Z OWNER Conclusion: using a giant convoluted CTE and UNION ALL query to attempt to calculate facets at the same time as retrieving rows is a net LOSS for performance! Very surprised to see that. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1055469073  
https://github.com/simonw/datasette/pull/1512#issuecomment-970857411 https://api.github.com/repos/simonw/datasette/issues/1512 970857411 IC_kwDOBm6k_c453hvD 9599 2021-11-16T23:43:21Z 2021-11-16T23:43:21Z OWNER ``` E File "/home/runner/work/datasette/datasette/datasette/utils/vendored_graphlib.py", line 56 E if (result := self._node2info.get(node)) is None: E ^ E SyntaxError: invalid syntax ``` Oh no - the vendored code I use has `:=` so doesn't work on Python 3.6! Will have to backport it more thoroughly. {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1055402144  
https://github.com/simonw/datasette/pull/1512#issuecomment-970861628 https://api.github.com/repos/simonw/datasette/issues/1512 970861628 IC_kwDOBm6k_c453iw8 9599 2021-11-16T23:46:07Z 2021-11-16T23:46:07Z OWNER I made the changes locally and tested them with Python 3.6 like so: ``` cd /tmp mkdir v cd v pipenv shell --python=python3.6 cd ~/Dropbox/Development/datasette pip install -e '.[test]' pytest tests/test_asyncdi.py ``` {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} 1055402144  
Powered by Datasette · Queries took 19.133ms · About: simonw/datasette-graphql