github: issue_comments: 12 rows where issue = 1055469073

12 rows where issue = 1055469073

Search:

descending

id ▼	html_url	issue_url	node_id	user	created_at	updated_at	author_association	body	reactions	issue
970738130	https://github.com/simonw/datasette/issues/1513#issuecomment-970738130	https://api.github.com/repos/simonw/datasette/issues/1513	IC_kwDOBm6k_c453EnS	simonw 9599	2021-11-16T22:32:19Z	2021-11-16T22:32:19Z	OWNER	I came up with the following query which seems to work! ```sql with cte as ( select rowid, country, country_long, name, owner, primary_fuel from [global-power-plants] ), truncated as ( select null as _facet, null as facet_name, null as facet_count, rowid, country, country_long, name, owner, primary_fuel from cte order by rowid limit 4 ), country_long_facet as ( select 'country_long' as _facet, country_long as facet_name, count() as facet_count, null, null, null, null, null, null from cte group by facet_name order by facet_count desc limit 3 ), owner_facet as ( select 'owner' as _facet, owner as facet_name, count() as facet_count, null, null, null, null, null, null from cte group by facet_name order by facet_count desc limit 3 ), primary_fuel_facet as ( select 'primary_fuel' as _facet, primary_fuel as facet_name, count() as facet_count, null, null, null, null, null, null from cte group by facet_name order by facet_count desc limit 3 ) select from truncated union all select * from country_long_facet union all select * from owner_facet union all select * from primary_fuel_facet ``` (Limits should be 101, 31, 31, 31 but I reduced size to get a shorter example table). Results [look like this](https://global-power-plants.datasettes.com/global-power-plants?sql=with+cte+as+%28%0D%0A++select+rowid%2C+country%2C+country_long%2C+name%2C+owner%2C+primary_fuel%0D%0A++from+%5Bglobal-power-plants%5D%0D%0A%29%2C%0D%0Atruncated+as+%28%0D%0A++select+null+as+_facet%2C+null+as+facet_name%2C+null+as+facet_count%2C+rowid%2C+country%2C+country_long%2C+name%2C+owner%2C+primary_fuel%0D%0A++from+cte+order+by+rowid+limit+4%0D%0A%29%2C%0D%0Acountry_long_facet+as+%28%0D%0A++select+%27country_long%27+as+_facet%2C+country_long+as+facet_name%2C+count%28*%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%2C%0D%0Aowner_facet+as+%28%0D%0A++select+%27owner%27+as+_facet%2C+owner+as+fa…	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Research: CTEs and union all to calculate facets AND query at the same time 1055469073
970742415	https://github.com/simonw/datasette/issues/1513#issuecomment-970742415	https://api.github.com/repos/simonw/datasette/issues/1513	IC_kwDOBm6k_c453FqP	simonw 9599	2021-11-16T22:37:14Z	2021-11-16T22:37:14Z	OWNER	The query takes 42.794ms to run. Here's the equivalent page using separate queries: https://global-power-plants.datasettes.com/global-power-plants/global-power-plants?_facet_size=3&_size=2&_nocount=1 Annoyingly I can't disable facet suggestions but keep facets. I'm going to turn on tracing so I can see how long the separate queries took.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Research: CTEs and union all to calculate facets AND query at the same time 1055469073
970758179	https://github.com/simonw/datasette/issues/1513#issuecomment-970758179	https://api.github.com/repos/simonw/datasette/issues/1513	IC_kwDOBm6k_c453Jgj	simonw 9599	2021-11-16T22:47:38Z	2021-11-16T22:47:38Z	OWNER	Trace now enabled: https://global-power-plants.datasettes.com/global-power-plants/global-power-plants?_facet_size=3&_size=2&_nocount=1&_trace=1 Here are the relevant traces: ```json [ { "type": "sql", "start": 31.214430154, "end": 31.214817089, "duration_ms": 0.3869350000016425, "traceback": [ " File \"/usr/local/lib/python3.8/site-packages/datasette/views/base.py\", line 262, in get\n return await self.view_get(\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/views/base.py\", line 477, in view_get\n response_or_template_contexts = await self.data(\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/views/table.py\", line 705, in data\n results = await db.execute(sql, params, truncate=True, **extra_args)\n" ], "database": "global-power-plants", "sql": "select rowid, country, country_long, name, gppd_idnr, capacity_mw, latitude, longitude, primary_fuel, other_fuel1, other_fuel2, other_fuel3, commissioning_year, owner, source, url, geolocation_source, wepp_id, year_of_capacity_data, generation_gwh_2013, generation_gwh_2014, generation_gwh_2015, generation_gwh_2016, generation_gwh_2017, generation_data_source, estimated_generation_gwh from [global-power-plants] order by rowid limit 3", "params": {} }, { "type": "sql", "start": 31.215234586, "end": 31.220110342, "duration_ms": 4.875756000000564, "traceback": [ " File \"/usr/local/lib/python3.8/site-packages/datasette/views/table.py\", line 760, in data\n ) = await facet.facet_results()\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/facets.py\", line 212, in facet_results\n facet_rows_results = await self.ds.execute(\n", " File \"/usr/local/lib/python3.8/site-packages/datasette/app.py\", line 634, in execute\n return await self.databases[db_name].execute(\n" ], "database": "global-power-plants", "sql": "select countr…	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Research: CTEs and union all to calculate facets AND query at the same time 1055469073
970766486	https://github.com/simonw/datasette/issues/1513#issuecomment-970766486	https://api.github.com/repos/simonw/datasette/issues/1513	IC_kwDOBm6k_c453LiW	simonw 9599	2021-11-16T22:52:56Z	2021-11-16T22:56:07Z	OWNER	https://covid-19.datasettes.com/covid is 805.2MB https://covid-19.datasettes.com/covid/ny_times_us_counties?_trace=1&_facet_size=3&_size=2 Equivalent SQL: https://covid-19.datasettes.com/covid?sql=with+cte+as+%28%0D%0A++select+rowid%2C+date%2C+county%2C+state%2C+fips%2C+cases%2C+deaths%0D%0A++from+ny_times_us_counties%0D%0A%29%2C%0D%0Atruncated+as+%28%0D%0A++select+null+as+_facet%2C+null+as+facet_name%2C+null+as+facet_count%2C+rowid%2C+date%2C+county%2C+state%2C+fips%2C+cases%2C+deaths%0D%0A++from+cte+order+by+date+desc+limit+4%0D%0A%29%2C%0D%0Astate_facet+as+%28%0D%0A++select+%27state%27+as+_facet%2C+state+as+facet_name%2C+count%28%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%2C%0D%0Afips_facet+as+%28%0D%0A++select+%27fips%27+as+_facet%2C+fips+as+facet_name%2C+count%28%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%2C%0D%0Acounty_facet+as+%28%0D%0A++select+%27county%27+as+_facet%2C+county+as+facet_name%2C+count%28%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A%29%2C%0D%0Atotal_count+as+%28%0D%0A++select+%27COUNT%27+as+_facet%2C+%27%27+as+facet_name%2C+count%28%29+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte%0D%0A%29%0D%0Aselect++from+truncated%0D%0Aunion+all+select++from+state_facet%0D%0Aunion+all+select++from+fips_facet%0D%0Aunion+all+select++from+county_facet%0D%0Aunion+all+select+*+from+total_count ```sql with cte as ( select rowid, date, county, state, fips, cases, deaths from ny_times_us_counties ), truncated as ( select null as _facet, null as facet_name, null as facet_count, rowid, date, county, state, fips, cases, deaths from cte order by date desc limit 4 ), state_facet as ( select 's…	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Research: CTEs and union all to calculate facets AND query at the same time 1055469073
970767952	https://github.com/simonw/datasette/issues/1513#issuecomment-970767952	https://api.github.com/repos/simonw/datasette/issues/1513	IC_kwDOBm6k_c453L5Q	simonw 9599	2021-11-16T22:53:52Z	2021-11-16T22:53:52Z	OWNER	It's going to take another 15 minutes for the build to finish and deploy the version with `_trace=1`: https://github.com/simonw/covid-19-datasette/actions/runs/1469150112	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Research: CTEs and union all to calculate facets AND query at the same time 1055469073
970770304	https://github.com/simonw/datasette/issues/1513#issuecomment-970770304	https://api.github.com/repos/simonw/datasette/issues/1513	IC_kwDOBm6k_c453MeA	simonw 9599	2021-11-16T22:55:19Z	2021-11-16T22:55:19Z	OWNER	(One thing I really like about this pattern is that it should work exactly the same when used to facet the results of arbitrary SQL queries as it does when faceting results from the table page.)	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Research: CTEs and union all to calculate facets AND query at the same time 1055469073
970780866	https://github.com/simonw/datasette/issues/1513#issuecomment-970780866	https://api.github.com/repos/simonw/datasette/issues/1513	IC_kwDOBm6k_c453PDC	simonw 9599	2021-11-16T23:01:57Z	2021-11-16T23:01:57Z	OWNER	One disadvantage to this approach: if you have a SQL time limit of 1s and it takes 0.9s to return the rows but then 0.5s to calculate each of the requested facets the entire query will exceed the time limit. Could work around this by catching that error and then re-running the query just for the rows, but that would result in the user having to wait longer for the results. Could try to remember if that has happened using an in-memory Python data structure and skip the faceting optimization if it's caused problems in the past? That seems a bit gross. Maybe this becomes an opt-in optimization you can request in your `metadata.json` setting for that table, which massively increases the time limit? That's a bit weird too - now there are two separate implementations of the faceting logic, which had better have a REALLY big pay-off to be worth maintaining. What if we kept the query that returns the rows to be displayed on the page separate from the facets, but then executed all of the facets together using this method such that the `cte` only (presumably) has to be calculated once? That would still lead to multiple facets potentially exceeding the SQL time limit when single facets would not have. Maybe a better optimization would be to move facets to happening via `fetch()` calls from the client, so the user gets to see their rows instantly and the facets then appear as and when they are available (though it would cause page jank).	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Research: CTEs and union all to calculate facets AND query at the same time 1055469073
970827674	https://github.com/simonw/datasette/issues/1513#issuecomment-970827674	https://api.github.com/repos/simonw/datasette/issues/1513	IC_kwDOBm6k_c453aea	simonw 9599	2021-11-16T23:26:58Z	2021-11-16T23:26:58Z	OWNER	With trace. https://covid-19.datasettes.com/covid/ny_times_us_counties?_trace=1&_facet_size=3&_size=2&_trace=1 shows the following: ``` fetch rows: 0.41762600005768036 ms facet state: 284.30423800000426 ms facet county: 273.2565999999679 ms facet fips: 197.80996999998024 ms ``` = 755.78843400001ms total It didn't run a count because that's the homepage and the count is cached. So I dropped the count from the query and ran it: https://covid-19.datasettes.com/covid?sql=with+cte+as+(%0D%0A++select+rowid%2C+date%2C+county%2C+state%2C+fips%2C+cases%2C+deaths%0D%0A++from+ny_times_us_counties%0D%0A)%2C%0D%0Atruncated+as+(%0D%0A++select+null+as+_facet%2C+null+as+facet_name%2C+null+as+facet_count%2C+rowid%2C+date%2C+county%2C+state%2C+fips%2C+cases%2C+deaths%0D%0A++from+cte+order+by+date+desc+limit+4%0D%0A)%2C%0D%0Astate_facet+as+(%0D%0A++select+%27state%27+as+_facet%2C+state+as+facet_name%2C+count()+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A)%2C%0D%0Afips_facet+as+(%0D%0A++select+%27fips%27+as+_facet%2C+fips+as+facet_name%2C+count()+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A)%2C%0D%0Acounty_facet+as+(%0D%0A++select+%27county%27+as+_facet%2C+county+as+facet_name%2C+count()+as+facet_count%2C%0D%0A++null%2C+null%2C+null%2C+null%2C+null%2C+null%2C+null%0D%0A++from+cte+group+by+facet_name+order+by+facet_count+desc+limit+3%0D%0A)%0D%0Aselect++from+truncated%0D%0Aunion+all+select++from+state_facet%0D%0Aunion+all+select++from+fips_facet%0D%0Aunion+all+select+*+from+county_facet&_trace=1 Shows 649.4359889999259 ms for the query - compared to 755.78843400001ms for the separate. So it saved about 100ms. Still not a huge difference though!	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Research: CTEs and union all to calculate facets AND query at the same time 1055469073
970828568	https://github.com/simonw/datasette/issues/1513#issuecomment-970828568	https://api.github.com/repos/simonw/datasette/issues/1513	IC_kwDOBm6k_c453asY	simonw 9599	2021-11-16T23:27:11Z	2021-11-16T23:27:11Z	OWNER	One last experiment: I'm going to try running an expensive query in the CTE portion.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Research: CTEs and union all to calculate facets AND query at the same time 1055469073
970845844	https://github.com/simonw/datasette/issues/1513#issuecomment-970845844	https://api.github.com/repos/simonw/datasette/issues/1513	IC_kwDOBm6k_c453e6U	simonw 9599	2021-11-16T23:35:38Z	2021-11-16T23:35:38Z	OWNER	I tried adding `cases > 10000` but the SQL query now takes too long - so moving this to my laptop. ``` cd /tmp wget https://covid-19.datasettes.com/covid.db datasette covid.db \ --setting facet_time_limit_ms 10000 \ --setting sql_time_limit_ms 10000 \ --setting trace_debug 1 ``` `http://127.0.0.1:8006/covid/ny_times_us_counties?_trace=1&_facet_size=3&_size=2&cases__gt=10000` shows in the traces: ```json [ { "type": "sql", "start": 12.693033525, "end": 12.694056904, "duration_ms": 1.0233789999993803, "traceback": [ " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/base.py\", line 262, in get\n return await self.view_get(\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/base.py\", line 477, in view_get\n response_or_template_contexts = await self.data(\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/table.py\", line 705, in data\n results = await db.execute(sql, params, truncate=True, **extra_args)\n" ], "database": "covid", "sql": "select rowid, date, county, state, fips, cases, deaths from ny_times_us_counties where \"cases\" > :p0 order by rowid limit 3", "params": { "p0": 10000 } }, { "type": "sql", "start": 12.694285093, "end": 12.814936275, "duration_ms": 120.65118200000136, "traceback": [ " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/base.py\", line 262, in get\n return await self.view_get(\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/base.py\", line 477, in view_get\n response_or_template_contexts = await self.data(\n", " File \"/usr/local/Cellar/datasette/0.58.1/libexec/lib/python3.9/site-packages/datasette/views/table.py\", line 723, i…	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Research: CTEs and union all to calculate facets AND query at the same time 1055469073
970853917	https://github.com/simonw/datasette/issues/1513#issuecomment-970853917	https://api.github.com/repos/simonw/datasette/issues/1513	IC_kwDOBm6k_c453g4d	simonw 9599	2021-11-16T23:41:01Z	2021-11-16T23:41:01Z	OWNER	One very interesting difference between the two: on the single giant query page: ```json { "request_duration_ms": 376.4317020000476, "sum_trace_duration_ms": 370.0828700000329, "num_traces": 5 } ``` And on the page that uses separate queries: ```json { "request_duration_ms": 819.012272000009, "sum_trace_duration_ms": 201.52852100000018, "num_traces": 19 } ``` The separate pages page takes 819ms total to render the page, but spends 201ms across 19 SQL queries. The single big query takes 376ms total to render the page, spending 370ms in 5 queries <details><summary>Those 5 queries, if you're interested</summary> ```sql select database_name, schema_version from databases PRAGMA schema_version PRAGMA schema_version explain with cte as (\r\n select rowid, date, county, state, fips, cases, deaths\r\n from ny_times_us_counties\r\n),\r\ntruncated as (\r\n select null as _facet, null as facet_name, null as facet_count, rowid, date, county, state, fips, cases, deaths\r\n from cte order by date desc limit 4\r\n),\r\nstate_facet as (\r\n select 'state' as _facet, state as facet_name, count() as facet_count,\r\n null, null, null, null, null, null, null\r\n from cte group by facet_name order by facet_count desc limit 3\r\n),\r\nfips_facet as (\r\n select 'fips' as _facet, fips as facet_name, count() as facet_count,\r\n null, null, null, null, null, null, null\r\n from cte group by facet_name order by facet_count desc limit 3\r\n),\r\ncounty_facet as (\r\n select 'county' as _facet, county as facet_name, count() as facet_count,\r\n null, null, null, null, null, null, null\r\n from cte group by facet_name order by facet_count desc limit 3\r\n)\r\nselect from truncated\r\nunion all select * from state_facet\r\nunion all select * from fips_facet\r\nunion all select * from county_facet with cte as (\r\n select rowid, date, county, state, fips, cases, deaths\r\n from ny_times_us_counties\r\n),\r\ntruncated as (\r\n select null as _facet, null as facet_name, null as face…	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Research: CTEs and union all to calculate facets AND query at the same time 1055469073
970855084	https://github.com/simonw/datasette/issues/1513#issuecomment-970855084	https://api.github.com/repos/simonw/datasette/issues/1513	IC_kwDOBm6k_c453hKs	simonw 9599	2021-11-16T23:41:46Z	2021-11-16T23:41:46Z	OWNER	Conclusion: using a giant convoluted CTE and UNION ALL query to attempt to calculate facets at the same time as retrieving rows is a net LOSS for performance! Very surprised to see that.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Research: CTEs and union all to calculate facets AND query at the same time 1055469073

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);