issue_comments
39 rows where "updated_at" is on date 2021-11-16
This data as json, CSV (advanced)
Suggested facets: issue_url, author_association, issue, created_at (date), updated_at (date)
id ▼ | html_url | issue_url | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
969557008 | https://github.com/simonw/datasette/issues/448#issuecomment-969557008 | https://api.github.com/repos/simonw/datasette/issues/448 | IC_kwDOBm6k_c45ykQQ | simonw 9599 | 2021-11-16T00:56:09Z | 2021-11-16T00:59:59Z | OWNER | This looks like it might work: ```sql with inner as ( select * from ads_with_targets where :p0 in ( select value from json_each([ads_with_targets].[target_names]) ) ), deduped_array_items as ( select distinct j.value, inner.* from json_each([inner].[target_names]) j join inner ) select value, count(*) from deduped_array_items group by value order by count(*) desc ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | _facet_array should work against views 440222719 | |
969557972 | https://github.com/simonw/datasette/issues/448#issuecomment-969557972 | https://api.github.com/repos/simonw/datasette/issues/448 | IC_kwDOBm6k_c45ykfU | simonw 9599 | 2021-11-16T00:56:58Z | 2021-11-16T00:56:58Z | OWNER | It uses a CTE which were introduced in SQLite 3.8 - and AWS Lambda Python 3.9 still provides 3.7 - but I've checked and I can use `pysqlite3-binary` to work around that there so I'm OK relying on CTEs for this. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | _facet_array should work against views 440222719 | |
969572281 | https://github.com/simonw/datasette/issues/448#issuecomment-969572281 | https://api.github.com/repos/simonw/datasette/issues/448 | IC_kwDOBm6k_c45yn-5 | simonw 9599 | 2021-11-16T01:05:11Z | 2021-11-16T01:05:11Z | OWNER | I tried this and it seems to work correctly: ```python for source_and_config in self.get_configs(): config = source_and_config["config"] source = source_and_config["source"] column = config.get("column") or config["simple"] facet_sql = """ with inner as ({sql}), deduped_array_items as ( select distinct j.value, inner.* from json_each([inner].{col}) j join inner ) select value as value, count(*) as count from deduped_array_items group by value order by count(*) desc limit {limit} """.format( col=escape_sqlite(column), sql=self.sql, limit=facet_size + 1 ) ``` The queries are _very_ slow though - I had to bump up to 2s time limit even against only a view returning 3,499 rows. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | _facet_array should work against views 440222719 | |
969578466 | https://github.com/simonw/datasette/issues/448#issuecomment-969578466 | https://api.github.com/repos/simonw/datasette/issues/448 | IC_kwDOBm6k_c45ypfi | simonw 9599 | 2021-11-16T01:08:29Z | 2021-11-16T01:08:29Z | OWNER | Actually with the cache warmed up it looks like the facet query is taking 150ms which is good enough. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | _facet_array should work against views 440222719 | |
969582098 | https://github.com/simonw/datasette/issues/448#issuecomment-969582098 | https://api.github.com/repos/simonw/datasette/issues/448 | IC_kwDOBm6k_c45yqYS | simonw 9599 | 2021-11-16T01:10:28Z | 2021-11-16T01:10:28Z | OWNER | Also note that this demo data is using a SQL view to create the JSON arrays - the view is defined as such: ```sql CREATE VIEW ads_with_targets as select ads.*, json_group_array(targets.name) as target_names from ads join ad_targets on ad_targets.ad_id = ads.id join targets on ad_targets.target_id = targets.id group by ad_targets.ad_id; ``` So running JSON faceting on top of that view is a pretty big ask! | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | _facet_array should work against views 440222719 | |
969600859 | https://github.com/simonw/datasette/issues/1511#issuecomment-969600859 | https://api.github.com/repos/simonw/datasette/issues/1511 | IC_kwDOBm6k_c45yu9b | simonw 9599 | 2021-11-16T01:20:13Z | 2021-11-16T01:20:13Z | OWNER | See: - #830 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Review plugin hooks for Datasette 1.0 1054246919 | |
969602825 | https://github.com/simonw/datasette/issues/1012#issuecomment-969602825 | https://api.github.com/repos/simonw/datasette/issues/1012 | IC_kwDOBm6k_c45yvcJ | simonw 9599 | 2021-11-16T01:21:14Z | 2021-11-16T01:21:14Z | OWNER | I'd been wondering how to get new classifiers into Trove - thanks, I'll give this a go. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | For 1.0 update trove classifier in setup.py 718540751 | |
969613166 | https://github.com/simonw/datasette/issues/1012#issuecomment-969613166 | https://api.github.com/repos/simonw/datasette/issues/1012 | IC_kwDOBm6k_c45yx9u | simonw 9599 | 2021-11-16T01:27:25Z | 2021-11-16T01:27:25Z | OWNER | Requested here: - https://github.com/pypa/trove-classifiers/pull/85 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | For 1.0 update trove classifier in setup.py 718540751 | |
969616626 | https://github.com/simonw/datasette/issues/1176#issuecomment-969616626 | https://api.github.com/repos/simonw/datasette/issues/1176 | IC_kwDOBm6k_c45yyzy | simonw 9599 | 2021-11-16T01:29:13Z | 2021-11-16T01:29:13Z | OWNER | I'm inclined to create a Sphinx reference documentation page for this, as I did for `sqlite-utils` here: https://sqlite-utils.datasette.io/en/stable/reference.html | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Policy on documenting "public" datasette.utils functions 779691739 | |
969621662 | https://github.com/simonw/datasette/issues/448#issuecomment-969621662 | https://api.github.com/repos/simonw/datasette/issues/448 | IC_kwDOBm6k_c45y0Ce | simonw 9599 | 2021-11-16T01:32:04Z | 2021-11-16T01:32:04Z | OWNER | Tests are failing and I think it's because the facets come back in different orders, need a tie-breaker. https://github.com/simonw/datasette/runs/4219325197?check_suite_focus=true | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | _facet_array should work against views 440222719 | |
970188065 | https://github.com/simonw/datasette/issues/1505#issuecomment-970188065 | https://api.github.com/repos/simonw/datasette/issues/1505 | IC_kwDOBm6k_c450-Uh | Segerberg 7094907 | 2021-11-16T11:40:52Z | 2021-11-16T11:40:52Z | NONE | A suggestion is to have the option to choose an arbitrary delimiter (and quoting characters ) | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Datasette should have an option to output CSV with semicolons 1052247023 | |
970266123 | https://github.com/simonw/datasette/issues/1012#issuecomment-970266123 | https://api.github.com/repos/simonw/datasette/issues/1012 | IC_kwDOBm6k_c451RYL | bollwyvl 45380 | 2021-11-16T13:18:36Z | 2021-11-16T13:18:36Z | CONTRIBUTOR | Congratulations, looks like it went through! There was a bit of a hold-up on the JupyterLab ones, but it's semi automated: a dependabot pr to warehouse and a CI deploy, with a click in between. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | For 1.0 update trove classifier in setup.py 718540751 | |
970544733 | https://github.com/simonw/datasette/issues/1509#issuecomment-970544733 | https://api.github.com/repos/simonw/datasette/issues/1509 | IC_kwDOBm6k_c452VZd | simonw 9599 | 2021-11-16T18:22:32Z | 2021-11-16T18:22:32Z | OWNER | This is mainly happening here: - https://github.com/simonw/datasette/issues/782 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Datasette 1.0 JSON API (and documentation) 1054243511 | |
970553780 | https://github.com/simonw/datasette/issues/782#issuecomment-970553780 | https://api.github.com/repos/simonw/datasette/issues/782 | IC_kwDOBm6k_c452Xm0 | simonw 9599 | 2021-11-16T18:30:51Z | 2021-11-16T18:30:58Z | OWNER | OK, I'm ready to start working on this today. I'm going to go with a default representation that looks like this: ```json { "rows": [ {"id": 1, "name": "One"}, {"id": 2, "name": "Two"} ], "next_url": null } ``` Note that there's no `count` - all it provides is the current selection of results and an indication as to how the next can be retrieved (`null` if there are no more results). I'll implement `?_extra=` to provide everything else. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Redesign default .json format 627794879 | |
970554697 | https://github.com/simonw/datasette/issues/782#issuecomment-970554697 | https://api.github.com/repos/simonw/datasette/issues/782 | IC_kwDOBm6k_c452X1J | simonw 9599 | 2021-11-16T18:32:03Z | 2021-11-16T18:32:03Z | OWNER | I'm going to take another look at this: - https://github.com/simonw/datasette/issues/878 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Redesign default .json format 627794879 | |
970624197 | https://github.com/simonw/datasette/issues/878#issuecomment-970624197 | https://api.github.com/repos/simonw/datasette/issues/878 | IC_kwDOBm6k_c452ozF | simonw 9599 | 2021-11-16T19:49:05Z | 2021-11-16T19:49:05Z | OWNER | Here's the latest version of my weird dependency injection async class: ```python import inspect class AsyncMeta(type): def __new__(cls, name, bases, attrs): # Decorate any items that are 'async def' methods _registry = {} new_attrs = {"_registry": _registry} for key, value in attrs.items(): if inspect.iscoroutinefunction(value) and not value.__name__ == "resolve": new_attrs[key] = make_method(value) _registry[key] = new_attrs[key] else: new_attrs[key] = value # Topological sort of _registry by parameter dependencies graph = { key: { p for p in inspect.signature(method).parameters.keys() if p != "self" and not p.startswith("_") } for key, method in _registry.items() } new_attrs["_graph"] = graph return super().__new__(cls, name, bases, new_attrs) def make_method(method): @wraps(method) async def inner(self, **kwargs): parameters = inspect.signature(method).parameters.keys() # Any parameters not provided by kwargs are resolved from registry to_resolve = [p for p in parameters if p not in kwargs and p != "self"] missing = [p for p in to_resolve if p not in self._registry] assert ( not missing ), "The following DI parameters could not be found in the registry: {}".format( missing ) results = {} results.update(kwargs) results.update(await self.resolve(to_resolve)) return await method(self, **results) return inner bad = [0] class AsyncBase(metaclass=AsyncMeta): async def resolve(self, names): print(" resolve({})".format(names)) results = {} # Resolve them in the correct order ts = TopologicalSorter() ts2 = TopologicalSorter() print(" names = ", names) print(" s… | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | New pattern for views that return either JSON or HTML, available for plugins 648435885 | |
970655304 | https://github.com/simonw/datasette/issues/878#issuecomment-970655304 | https://api.github.com/repos/simonw/datasette/issues/878 | IC_kwDOBm6k_c452wZI | simonw 9599 | 2021-11-16T20:32:16Z | 2021-11-16T20:32:16Z | OWNER | This code is really fiddly. I just got to this version: ```python import asyncio from functools import wraps import inspect try: import graphlib except ImportError: from . import vendored_graphlib as graphlib class AsyncMeta(type): def __new__(cls, name, bases, attrs): # Decorate any items that are 'async def' methods _registry = {} new_attrs = {"_registry": _registry} for key, value in attrs.items(): if inspect.iscoroutinefunction(value) and not value.__name__ == "resolve": new_attrs[key] = make_method(value) _registry[key] = new_attrs[key] else: new_attrs[key] = value # Gather graph for later dependency resolution graph = { key: { p for p in inspect.signature(method).parameters.keys() if p != "self" and not p.startswith("_") } for key, method in _registry.items() } new_attrs["_graph"] = graph return super().__new__(cls, name, bases, new_attrs) def make_method(method): @wraps(method) async def inner(self, _results=None, **kwargs): print("inner - _results=", _results) parameters = inspect.signature(method).parameters.keys() # Any parameters not provided by kwargs are resolved from registry to_resolve = [p for p in parameters if p not in kwargs and p != "self"] missing = [p for p in to_resolve if p not in self._registry] assert ( not missing ), "The following DI parameters could not be found in the registry: {}".format( missing ) results = {} results.update(kwargs) if to_resolve: resolved_parameters = await self.resolve(to_resolve, _results) results.update(resolved_parameters) return_value = await method(self, **results) if _results is not None: _res… | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | New pattern for views that return either JSON or HTML, available for plugins 648435885 | |
970655927 | https://github.com/simonw/datasette/issues/878#issuecomment-970655927 | https://api.github.com/repos/simonw/datasette/issues/878 | IC_kwDOBm6k_c452wi3 | simonw 9599 | 2021-11-16T20:33:11Z | 2021-11-16T20:33:11Z | OWNER | What should be happening here instead is it should resolve the full graph and notice that `c` is depended on by both `b` and `a` - so it should run `c` first, then run the next ones in parallel. So maybe the algorithm I'm inheriting from https://docs.python.org/3/library/graphlib.html isn't the correct algorithm? | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | New pattern for views that return either JSON or HTML, available for plugins 648435885 | |
970657874 | https://github.com/simonw/datasette/issues/878#issuecomment-970657874 | https://api.github.com/repos/simonw/datasette/issues/878 | IC_kwDOBm6k_c452xBS | simonw 9599 | 2021-11-16T20:36:01Z | 2021-11-16T20:36:01Z | OWNER | My goal here is to calculate the most efficient way to resolve the different nodes, running them in parallel where possible. So for this class: ```python class Complex(AsyncBase): async def d(self): pass async def c(self): pass async def b(self, c, d): pass async def a(self, b, c): pass async def go(self, a): pass ``` A call to `go()` should do this: - `c` and `d` in parallel - `b` - `a` - `go` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | New pattern for views that return either JSON or HTML, available for plugins 648435885 | |
970660299 | https://github.com/simonw/datasette/issues/878#issuecomment-970660299 | https://api.github.com/repos/simonw/datasette/issues/878 | IC_kwDOBm6k_c452xnL | simonw 9599 | 2021-11-16T20:39:43Z | 2021-11-16T20:42:27Z | OWNER | But that does seem to be the plan that `TopographicalSorter` provides: ```python graph = {"go": {"a"}, "a": {"b", "c"}, "b": {"c", "d"}} ts = TopologicalSorter(graph) ts.prepare() while ts.is_active(): nodes = ts.get_ready() print(nodes) ts.done(*nodes) ``` Outputs: ``` ('c', 'd') ('b',) ('a',) ('go',) ``` Also: ```python graph = {"go": {"d", "e", "f"}, "d": {"b", "c"}, "b": {"c"}} ts = TopologicalSorter(graph) ts.prepare() while ts.is_active(): nodes = ts.get_ready() print(nodes) ts.done(*nodes) ``` Gives: ``` ('e', 'f', 'c') ('b',) ('d',) ('go',) ``` I'm confident that `TopologicalSorter` is the way to do this. I think I need to rewrite my code to call it once to get that plan, then `await asyncio.gather(*nodes)` in turn to execute it. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | New pattern for views that return either JSON or HTML, available for plugins 648435885 | |
970673085 | https://github.com/simonw/datasette/issues/878#issuecomment-970673085 | https://api.github.com/repos/simonw/datasette/issues/878 | IC_kwDOBm6k_c4520u9 | simonw 9599 | 2021-11-16T20:58:24Z | 2021-11-16T20:58:24Z | OWNER | New test: ```python class Complex(AsyncBase): def __init__(self): self.log = [] async def d(self): await asyncio.sleep(random() * 0.1) print("LOG: d") self.log.append("d") async def c(self): await asyncio.sleep(random() * 0.1) print("LOG: c") self.log.append("c") async def b(self, c, d): print("LOG: b") self.log.append("b") async def a(self, b, c): print("LOG: a") self.log.append("a") async def go(self, a): print("LOG: go") self.log.append("go") return self.log @pytest.mark.asyncio async def test_complex(): result = await Complex().go() # 'c' should only be called once assert tuple(result) in ( # c and d could happen in either order ("c", "d", "b", "a", "go"), ("d", "c", "b", "a", "go"), ) ``` And this code passes it: ```python import asyncio from functools import wraps import inspect try: import graphlib except ImportError: from . import vendored_graphlib as graphlib class AsyncMeta(type): def __new__(cls, name, bases, attrs): # Decorate any items that are 'async def' methods _registry = {} new_attrs = {"_registry": _registry} for key, value in attrs.items(): if inspect.iscoroutinefunction(value) and not value.__name__ == "resolve": new_attrs[key] = make_method(value) _registry[key] = new_attrs[key] else: new_attrs[key] = value # Gather graph for later dependency resolution graph = { key: { p for p in inspect.signature(method).parameters.keys() if p != "self" and not p.startswith("_") } for key, method in _registry.items() } new_attrs["_graph"] = graph return super().__new__(cls, name, bases, new_attrs) def make_met… | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | New pattern for views that return either JSON or HTML, available for plugins 648435885 | |
970705738 | https://github.com/simonw/datasette/issues/878#issuecomment-970705738 | https://api.github.com/repos/simonw/datasette/issues/878 | IC_kwDOBm6k_c4528tK | simonw 9599 | 2021-11-16T21:44:31Z | 2021-11-16T21:44:31Z | OWNER | Wrote a TIL about what I learned using `TopologicalSorter`: https://til.simonwillison.net/python/graphlib-topologicalsorter | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | New pattern for views that return either JSON or HTML, available for plugins 648435885 | |
970712713 | https://github.com/simonw/datasette/issues/878#issuecomment-970712713 | https://api.github.com/repos/simonw/datasette/issues/878 | IC_kwDOBm6k_c452-aJ | simonw 9599 | 2021-11-16T21:54:33Z | 2021-11-16T21:54:33Z | OWNER | I'm going to continue working on this in a PR. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | New pattern for views that return either JSON or HTML, available for plugins 648435885 | |
970718337 | https://github.com/simonw/datasette/pull/1512#issuecomment-970718337 | https://api.github.com/repos/simonw/datasette/issues/1512 | IC_kwDOBm6k_c452_yB | simonw 9599 | 2021-11-16T22:02:30Z | 2021-11-16T22:02:30Z | OWNER | I've decided to make the clever `asyncio` dependency injection opt-in - so you can either decorate with `@inject` or you can set `inject_all = True` on the class - for example: ```python import asyncio from datasette.utils.asyncdi import AsyncBase, inject class Simple(AsyncBase): def __init__(self): self.log = [] @inject async def two(self): self.log.append("two") @inject async def one(self, two): self.log.append("one") return self.log async def not_inject(self, one, two): return one + two class Complex(AsyncBase): inject_all = True def __init__(self): self.log = [] async def b(self): self.log.append("b") async def a(self, b): self.log.append("a") async def go(self, a): self.log.append("go") return self.log ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | New pattern for async view classes 1055402144 | |
970718652 | https://github.com/simonw/datasette/pull/1512#issuecomment-970718652 | https://api.github.com/repos/simonw/datasette/issues/1512 | IC_kwDOBm6k_c452_28 | codecov[bot] 22429695 | 2021-11-16T22:02:59Z | 2021-11-16T23:51:48Z | NONE | # [Codecov](https://codecov.io/gh/simonw/datasette/pull/1512?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) Report > Merging [#1512](https://codecov.io/gh/simonw/datasette/pull/1512?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) (8f757da) into [main](https://codecov.io/gh/simonw/datasette/commit/0156c6b5e52d541e93f0d68e9245f20ae83bc933?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) (0156c6b) will **decrease** coverage by `2.10%`. > The diff coverage is `36.20%`. [](https://codecov.io/gh/simonw/datasette/pull/1512?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) ```diff @@ Coverage Diff @@ ## main #1512 +/- ## ========================================== - Coverage 91.82% 89.72% -2.11% ========================================== Files 34 36 +2 Lines 4430 4604 +174 ========================================== + Hits 4068 4131 +63 - Misses 362 473 +111 ``` | [Impacted Files](https://codecov.io/gh/simonw/datasette/pull/1512?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) | Coverage Δ | | |---|---|---| | [datasette/utils/vendored\_graphlib.py](https://codecov.io/gh/simonw/datasette/pull/1512/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3V0aWxzL3ZlbmRvcmVkX2dyYXBobGliLnB5) |… | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | New pattern for async view classes 1055402144 | |
970738130 | https://github.com/simonw/datasette/issues/1513#issuecomment-970738130 | https://api.github.com/repos/simonw/datasette/issues/1513 | IC_kwDOBm6k_c453EnS | simonw 9599 | 2021-11-16T22:32:19Z | 2021-11-16T22:32:19Z | OWNER | I came up with the following query which seems to work! ```sql with cte as ( select rowid, country, country_long, name, owner, primary_fuel from [global-power-plants] ), truncated as ( select null as _facet, null as facet_name, null as facet_count, rowid, country, country_long, name, owner, primary_fuel from cte order by rowid limit 4 ), country_long_facet as ( select 'country_long' as _facet, country_long as facet_name, count(*) as facet_count, null, null, null, null, null, null from cte group by facet_name order by facet_count desc limit 3 ), owner_facet as ( select 'owner' as _facet, owner as facet_name, count(*) as facet_count, null, null, null, null, null, null from cte group by facet_name order by facet_count desc limit 3 ), primary_fuel_facet as ( select 'primary_fuel' as _facet, primary_fuel as facet_name, count(*) as facet_count, null, null, null, null, null, null from cte group by facet_name order by facet_count desc limit 3 ) select * from truncated union all select * from country_long_facet union all select * from owner_facet union all select * from primary_fuel_facet ``` (Limits should be 101, 31, 31, 31 but I reduced size to get a shorter example table). Results [look like this](https://global-power-plants.datasettes.com/global-power-plants?sql=with+cte+as+%28%0D%0A++select+rowid%2C+country%2C+country_long%2C+name%2C+owner%2C+primary_fuel%0D%0A++from+%5Bglobal-power-plants%5D%0D%0A%29%2C%0D%0Atruncated+as+%28%0D%0A++select+null+as+_facet%2C+null+as+facet_name%2C+null+as+facet_count%2C+rowid%2C+country%2C+country_long%2C+name%2C+owner%2C+primary_fuel%0D%0A++from+cte+order+by+rowid+limit+4%0D%0A%29%2C%0D%0Acountry_long_facet+as+%28%0D%0A++select+%27country_long%27+as+_facet%2C+country_long+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%2C%0D%0Aowner_facet+as+%28%0D%0A++select+%27owner%27+as+_facet%2C+owner+as+fa… | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Research: CTEs and union all to calculate facets AND query at the same time 1055469073 | |
970742415 | https://github.com/simonw/datasette/issues/1513#issuecomment-970742415 | https://api.github.com/repos/simonw/datasette/issues/1513 | IC_kwDOBm6k_c453FqP | simonw 9599 | 2021-11-16T22:37:14Z | 2021-11-16T22:37:14Z | OWNER | The query takes 42.794ms to run. Here's the equivalent page using separate queries: https://global-power-plants.datasettes.com/global-power-plants/global-power-plants?_facet_size=3&_size=2&_nocount=1 Annoyingly I can't disable facet suggestions but keep facets. I'm going to turn on tracing so I can see how long the separate queries took. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Research: CTEs and union all to calculate facets AND query at the same time 1055469073 | |
970758179 | https://github.com/simonw/datasette/issues/1513#issuecomment-970758179 | https://api.github.com/repos/simonw/datasette/issues/1513 | IC_kwDOBm6k_c453Jgj | simonw 9599 | 2021-11-16T22:47:38Z | 2021-11-16T22:47:38Z | OWNER | Trace now enabled: https://global-power-plants.datasettes.com/global-power-plants/global-power-plants?_facet_size=3&_size=2&_nocount=1&_trace=1 Here are the relevant traces: ```json [ { "type": "sql", "start": 31.214430154, "end": 31.214817089, "duration_ms": 0.3869350000016425, "traceback": [ " File \"/usr/local/lib/python3.8/site-packages/datasette/views/base.py\", line 262, in get\n return await self.view_get(\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/views/base.py\", line 477, in view_get\n response_or_template_contexts = await self.data(\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/views/table.py\", line 705, in data\n results = await db.execute(sql, params, truncate=True, **extra_args)\n" ], "database": "global-power-plants", "sql": "select rowid, country, country_long, name, gppd_idnr, capacity_mw, latitude, longitude, primary_fuel, other_fuel1, other_fuel2, other_fuel3, commissioning_year, owner, source, url, geolocation_source, wepp_id, year_of_capacity_data, generation_gwh_2013, generation_gwh_2014, generation_gwh_2015, generation_gwh_2016, generation_gwh_2017, generation_data_source, estimated_generation_gwh from [global-power-plants] order by rowid limit 3", "params": {} }, { "type": "sql", "start": 31.215234586, "end": 31.220110342, "duration_ms": 4.875756000000564, "traceback": [ " File \"/usr/local/lib/python3.8/site-packages/datasette/views/table.py\", line 760, in data\n ) = await facet.facet_results()\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/facets.py\", line 212, in facet_results\n facet_rows_results = await self.ds.execute(\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/app.py\", line 634, in execute\n return await self.databases[db_name].execute(\n" ], "database": "global-power-plants", "sql": "select countr… | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Research: CTEs and union all to calculate facets AND query at the same time 1055469073 | |
970766486 | https://github.com/simonw/datasette/issues/1513#issuecomment-970766486 | https://api.github.com/repos/simonw/datasette/issues/1513 | IC_kwDOBm6k_c453LiW | simonw 9599 | 2021-11-16T22:52:56Z | 2021-11-16T22:56:07Z | OWNER | https://covid-19.datasettes.com/covid is 805.2MB https://covid-19.datasettes.com/covid/ny_times_us_counties?_trace=1&_facet_size=3&_size=2 Equivalent SQL: https://covid-19.datasettes.com/covid?sql=with+cte+as+%28%0D%0A++select+rowid%2C+date%2C+county%2C+state%2C+fips%2C+cases%2C+deaths%0D%0A++from+ny_times_us_counties%0D%0A%29%2C%0D%0Atruncated+as+%28%0D%0A++select+null+as+_facet%2C+null+as+facet_name%2C+null+as+facet_count%2C+rowid%2C+date%2C+county%2C+state%2C+fips%2C+cases%2C+deaths%0D%0A++from+cte+order+by+date+desc+limit+4%0D%0A%29%2C%0D%0Astate_facet+as+%28%0D%0A++select+%27state%27+as+_facet%2C+state+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%2C%0D%0Afips_facet+as+%28%0D%0A++select+%27fips%27+as+_facet%2C+fips+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%2C%0D%0Acounty_facet+as+%28%0D%0A++select+%27county%27+as+_facet%2C+county+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%2C%0D%0Atotal_count+as+%28%0D%0A++select+%27COUNT%27+as+_facet%2C+%27%27+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte%0D%0A%29%0D%0Aselect+*+from+truncated%0D%0Aunion+all+select+*+from+state_facet%0D%0Aunion+all+select+*+from+fips_facet%0D%0Aunion+all+select+*+from+county_facet%0D%0Aunion+all+select+*+from+total_count ```sql with cte as ( select rowid, date, county, state, fips, cases, deaths from ny_times_us_counties ), truncated as ( select null as _facet, null as facet_name, null as facet_count, rowid, date, county, state, fips, cases, deaths from cte order by date desc limit 4 ), state_facet as ( select 's… | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Research: CTEs and union all to calculate facets AND query at the same time 1055469073 | |
970767952 | https://github.com/simonw/datasette/issues/1513#issuecomment-970767952 | https://api.github.com/repos/simonw/datasette/issues/1513 | IC_kwDOBm6k_c453L5Q | simonw 9599 | 2021-11-16T22:53:52Z | 2021-11-16T22:53:52Z | OWNER | It's going to take another 15 minutes for the build to finish and deploy the version with `_trace=1`: https://github.com/simonw/covid-19-datasette/actions/runs/1469150112 | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Research: CTEs and union all to calculate facets AND query at the same time 1055469073 | |
970770304 | https://github.com/simonw/datasette/issues/1513#issuecomment-970770304 | https://api.github.com/repos/simonw/datasette/issues/1513 | IC_kwDOBm6k_c453MeA | simonw 9599 | 2021-11-16T22:55:19Z | 2021-11-16T22:55:19Z | OWNER | (One thing I really like about this pattern is that it should work exactly the same when used to facet the results of arbitrary SQL queries as it does when faceting results from the table page.) | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Research: CTEs and union all to calculate facets AND query at the same time 1055469073 | |
970780866 | https://github.com/simonw/datasette/issues/1513#issuecomment-970780866 | https://api.github.com/repos/simonw/datasette/issues/1513 | IC_kwDOBm6k_c453PDC | simonw 9599 | 2021-11-16T23:01:57Z | 2021-11-16T23:01:57Z | OWNER | One disadvantage to this approach: if you have a SQL time limit of 1s and it takes 0.9s to return the rows but then 0.5s to calculate each of the requested facets the entire query will exceed the time limit. Could work around this by catching that error and then re-running the query just for the rows, but that would result in the user having to wait longer for the results. Could try to remember if that has happened using an in-memory Python data structure and skip the faceting optimization if it's caused problems in the past? That seems a bit gross. Maybe this becomes an opt-in optimization you can request in your `metadata.json` setting for that table, which massively increases the time limit? That's a bit weird too - now there are two separate implementations of the faceting logic, which had better have a REALLY big pay-off to be worth maintaining. What if we kept the query that returns the rows to be displayed on the page separate from the facets, but then executed all of the facets together using this method such that the `cte` only (presumably) has to be calculated once? That would still lead to multiple facets potentially exceeding the SQL time limit when single facets would not have. Maybe a better optimization would be to move facets to happening via `fetch()` calls from the client, so the user gets to see their rows instantly and the facets then appear as and when they are available (though it would cause page jank). | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Research: CTEs and union all to calculate facets AND query at the same time 1055469073 | |
970827674 | https://github.com/simonw/datasette/issues/1513#issuecomment-970827674 | https://api.github.com/repos/simonw/datasette/issues/1513 | IC_kwDOBm6k_c453aea | simonw 9599 | 2021-11-16T23:26:58Z | 2021-11-16T23:26:58Z | OWNER | With trace. https://covid-19.datasettes.com/covid/ny_times_us_counties?_trace=1&_facet_size=3&_size=2&_trace=1 shows the following: ``` fetch rows: 0.41762600005768036 ms facet state: 284.30423800000426 ms facet county: 273.2565999999679 ms facet fips: 197.80996999998024 ms ``` = 755.78843400001ms total It didn't run a count because that's the homepage and the count is cached. So I dropped the count from the query and ran it: https://covid-19.datasettes.com/covid?sql=with+cte+as+(%0D%0A++select+rowid%2C+date%2C+county%2C+state%2C+fips%2C+cases%2C+deaths%0D%0A++from+ny_times_us_counties%0D%0A)%2C%0D%0Atruncated+as+(%0D%0A++select+null+as+_facet%2C+null+as+facet_name%2C+null+as+facet_count%2C+rowid%2C+date%2C+county%2C+state%2C+fips%2C+cases%2C+deaths%0D%0A++from+cte+order+by+date+desc+limit+4%0D%0A)%2C%0D%0Astate_facet+as+(%0D%0A++select+%27state%27+as+_facet%2C+state+as+facet_name%2C+count(*)+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A)%2C%0D%0Afips_facet+as+(%0D%0A++select+%27fips%27+as+_facet%2C+fips+as+facet_name%2C+count(*)+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A)%2C%0D%0Acounty_facet+as+(%0D%0A++select+%27county%27+as+_facet%2C+county+as+facet_name%2C+count(*)+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A)%0D%0Aselect+*+from+truncated%0D%0Aunion+all+select+*+from+state_facet%0D%0Aunion+all+select+*+from+fips_facet%0D%0Aunion+all+select+*+from+county_facet&_trace=1 Shows 649.4359889999259 ms for the query - compared to 755.78843400001ms for the separate. So it saved about 100ms. Still not a huge difference though! | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Research: CTEs and union all to calculate facets AND query at the same time 1055469073 | |
970828568 | https://github.com/simonw/datasette/issues/1513#issuecomment-970828568 | https://api.github.com/repos/simonw/datasette/issues/1513 | IC_kwDOBm6k_c453asY | simonw 9599 | 2021-11-16T23:27:11Z | 2021-11-16T23:27:11Z | OWNER | One last experiment: I'm going to try running an expensive query in the CTE portion. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Research: CTEs and union all to calculate facets AND query at the same time 1055469073 | |
970845844 | https://github.com/simonw/datasette/issues/1513#issuecomment-970845844 | https://api.github.com/repos/simonw/datasette/issues/1513 | IC_kwDOBm6k_c453e6U | simonw 9599 | 2021-11-16T23:35:38Z | 2021-11-16T23:35:38Z | OWNER | I tried adding `cases > 10000` but the SQL query now takes too long - so moving this to my laptop. ``` cd /tmp wget https://covid-19.datasettes.com/covid.db datasette covid.db \ --setting facet_time_limit_ms 10000 \ --setting sql_time_limit_ms 10000 \ --setting trace_debug 1 ``` `http://127.0.0.1:8006/covid/ny_times_us_counties?_trace=1&_facet_size=3&_size=2&cases__gt=10000` shows in the traces: ```json [ { "type": "sql", "start": 12.693033525, "end": 12.694056904, "duration_ms": 1.0233789999993803, "traceback": [ " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/base.py\", line 262, in get\n return await self.view_get(\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/base.py\", line 477, in view_get\n response_or_template_contexts = await self.data(\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/table.py\", line 705, in data\n results = await db.execute(sql, params, truncate=True, **extra_args)\n" ], "database": "covid", "sql": "select rowid, date, county, state, fips, cases, deaths from ny_times_us_counties where \"cases\" > :p0 order by rowid limit 3", "params": { "p0": 10000 } }, { "type": "sql", "start": 12.694285093, "end": 12.814936275, "duration_ms": 120.65118200000136, "traceback": [ " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/base.py\", line 262, in get\n return await self.view_get(\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/base.py\", line 477, in view_get\n response_or_template_contexts = await self.data(\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/table.py\", line 723, i… | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Research: CTEs and union all to calculate facets AND query at the same time 1055469073 | |
970853917 | https://github.com/simonw/datasette/issues/1513#issuecomment-970853917 | https://api.github.com/repos/simonw/datasette/issues/1513 | IC_kwDOBm6k_c453g4d | simonw 9599 | 2021-11-16T23:41:01Z | 2021-11-16T23:41:01Z | OWNER | One very interesting difference between the two: on the single giant query page: ```json { "request_duration_ms": 376.4317020000476, "sum_trace_duration_ms": 370.0828700000329, "num_traces": 5 } ``` And on the page that uses separate queries: ```json { "request_duration_ms": 819.012272000009, "sum_trace_duration_ms": 201.52852100000018, "num_traces": 19 } ``` The separate pages page takes 819ms total to render the page, but spends 201ms across 19 SQL queries. The single big query takes 376ms total to render the page, spending 370ms in 5 queries <details><summary>Those 5 queries, if you're interested</summary> ```sql select database_name, schema_version from databases PRAGMA schema_version PRAGMA schema_version explain with cte as (\r\n select rowid, date, county, state, fips, cases, deaths\r\n from ny_times_us_counties\r\n),\r\ntruncated as (\r\n select null as _facet, null as facet_name, null as facet_count, rowid, date, county, state, fips, cases, deaths\r\n from cte order by date desc limit 4\r\n),\r\nstate_facet as (\r\n select 'state' as _facet, state as facet_name, count(*) as facet_count,\r\n null, null, null, null, null, null, null\r\n from cte group by facet_name order by facet_count desc limit 3\r\n),\r\nfips_facet as (\r\n select 'fips' as _facet, fips as facet_name, count(*) as facet_count,\r\n null, null, null, null, null, null, null\r\n from cte group by facet_name order by facet_count desc limit 3\r\n),\r\ncounty_facet as (\r\n select 'county' as _facet, county as facet_name, count(*) as facet_count,\r\n null, null, null, null, null, null, null\r\n from cte group by facet_name order by facet_count desc limit 3\r\n)\r\nselect * from truncated\r\nunion all select * from state_facet\r\nunion all select * from fips_facet\r\nunion all select * from county_facet with cte as (\r\n select rowid, date, county, state, fips, cases, deaths\r\n from ny_times_us_counties\r\n),\r\ntruncated as (\r\n select null as _facet, null as facet_name, null as face… | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Research: CTEs and union all to calculate facets AND query at the same time 1055469073 | |
970855084 | https://github.com/simonw/datasette/issues/1513#issuecomment-970855084 | https://api.github.com/repos/simonw/datasette/issues/1513 | IC_kwDOBm6k_c453hKs | simonw 9599 | 2021-11-16T23:41:46Z | 2021-11-16T23:41:46Z | OWNER | Conclusion: using a giant convoluted CTE and UNION ALL query to attempt to calculate facets at the same time as retrieving rows is a net LOSS for performance! Very surprised to see that. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | Research: CTEs and union all to calculate facets AND query at the same time 1055469073 | |
970857411 | https://github.com/simonw/datasette/pull/1512#issuecomment-970857411 | https://api.github.com/repos/simonw/datasette/issues/1512 | IC_kwDOBm6k_c453hvD | simonw 9599 | 2021-11-16T23:43:21Z | 2021-11-16T23:43:21Z | OWNER | ``` E File "/home/runner/work/datasette/datasette/datasette/utils/vendored_graphlib.py", line 56 E if (result := self._node2info.get(node)) is None: E ^ E SyntaxError: invalid syntax ``` Oh no - the vendored code I use has `:=` so doesn't work on Python 3.6! Will have to backport it more thoroughly. | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | New pattern for async view classes 1055402144 | |
970861628 | https://github.com/simonw/datasette/pull/1512#issuecomment-970861628 | https://api.github.com/repos/simonw/datasette/issues/1512 | IC_kwDOBm6k_c453iw8 | simonw 9599 | 2021-11-16T23:46:07Z | 2021-11-16T23:46:07Z | OWNER | I made the changes locally and tested them with Python 3.6 like so: ``` cd /tmp mkdir v cd v pipenv shell --python=python3.6 cd ~/Dropbox/Development/datasette pip install -e '.[test]' pytest tests/test_asyncdi.py ``` | {"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | New pattern for async view classes 1055402144 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [issue] INTEGER REFERENCES [issues]([id]) , [performed_via_github_app] TEXT); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);