{"html_url": "https://github.com/simonw/datasette/issues/448#issuecomment-489240609", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/448", "id": 489240609, "node_id": "MDEyOklzc3VlQ29tbWVudDQ4OTI0MDYwOQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-05-03T21:09:13Z", "updated_at": "2019-05-03T21:09:13Z", "author_association": "OWNER", "body": "It may be that some facet implementations (`ArrayFacet` in this case) need a way to detect if they are supported by the thing they are running against (must be a rowid table in this case) and avoid suggesting themselves if they are not compatible. This may require a change to the information we make available to the `suggest()` method (information passed to the Facet class constructor).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 440222719, "label": "_facet_array should work against views"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/448#issuecomment-969436930", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/448", "id": 969436930, "node_id": "IC_kwDOBm6k_c45yG8C", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-11-15T23:31:58Z", "updated_at": "2021-11-15T23:31:58Z", "author_association": "OWNER", "body": "I think this SQL recipe may work instead:\r\n```sql\r\nselect\r\n *\r\nfrom\r\n ads_with_targets\r\nwhere\r\n 'people_who_match:interests:African-American Civil Rights Movement (1954\u201468)' in (\r\n select\r\n value\r\n from\r\n json_each(target_names)\r\n )\r\n and 'interests:Martin Luther King III' in (\r\n select\r\n value\r\n from\r\n json_each(target_names)\r\n )\r\n```\r\nhttps://json-view-facet-bug-demo-j7hipcg4aq-uc.a.run.app/russian-ads?sql=select%0D%0A++*%0D%0Afrom%0D%0A++ads_with_targets%0D%0Awhere%0D%0A++%27people_who_match%3Ainterests%3AAfrican-American+Civil+Rights+Movement+%281954%E2%80%9468%29%27+in+%28%0D%0A++++select%0D%0A++++++value%0D%0A++++from%0D%0A++++++json_each%28target_names%29%0D%0A++%29%0D%0A++and+%27interests%3AMartin+Luther+King+III%27+in+%28%0D%0A++++select%0D%0A++++++value%0D%0A++++from%0D%0A++++++json_each%28target_names%29%0D%0A++%29&interests=&African=&Martin=", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 440222719, "label": "_facet_array should work against views"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/448#issuecomment-969440918", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/448", "id": 969440918, "node_id": "IC_kwDOBm6k_c45yH6W", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-11-15T23:40:17Z", "updated_at": "2021-11-15T23:40:35Z", "author_association": "OWNER", "body": "Applied that fix to the `arraycontains` filter but I'm still getting bad results for the faceting:\r\n\r\n\"russian-ads__ads_with_targets__172_rows_where_where_target_names_contains__people_who_match_interests_African-American_culture__and_datasette_\u2014_pipenv_shell_\u25b8_python_\u2014_80\u00d724\"\r\n\r\nShould never get 182 results on a page that faceting against only 172 items.\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 440222719, "label": "_facet_array should work against views"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/448#issuecomment-969442215", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/448", "id": 969442215, "node_id": "IC_kwDOBm6k_c45yIOn", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-11-15T23:42:03Z", "updated_at": "2021-11-15T23:42:03Z", "author_association": "OWNER", "body": "I think this code is wrong in the `ArrayFacet` class: https://github.com/simonw/datasette/blob/502c02fa6dde6a8bb840af6c4c8cf858aa1db687/datasette/facets.py#L357-L364", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 440222719, "label": "_facet_array should work against views"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/448#issuecomment-969446972", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/448", "id": 969446972, "node_id": "IC_kwDOBm6k_c45yJY8", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-11-15T23:46:13Z", "updated_at": "2021-11-15T23:46:13Z", "author_association": "OWNER", "body": "It looks like the problem here is that some of the tags occur more than once in the documents:\r\n\r\n\"russian-ads__ads_with_targets__172_rows_where_where_target_names_contains__people_who_match_interests_African-American_culture_\"\r\n\r\nSo they get counted more than once, hence the 182 count for something that couldn't possibly return more than 172 documents.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 440222719, "label": "_facet_array should work against views"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/448#issuecomment-969449772", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/448", "id": 969449772, "node_id": "IC_kwDOBm6k_c45yKEs", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-11-15T23:48:37Z", "updated_at": "2021-11-15T23:48:37Z", "author_association": "OWNER", "body": "Given this query: https://json-view-facet-bug-demo-j7hipcg4aq-uc.a.run.app/russian-ads?sql=select%0D%0A++j.value+as+value%2C%0D%0A++count%28*%29+as+count%0D%0Afrom%0D%0A++%28%0D%0A++++select%0D%0A++++++id%2C%0D%0A++++++file%2C%0D%0A++++++clicks%2C%0D%0A++++++impressions%2C%0D%0A++++++text%2C%0D%0A++++++url%2C%0D%0A++++++spend_amount%2C%0D%0A++++++spend_currency%2C%0D%0A++++++created%2C%0D%0A++++++ended%2C%0D%0A++++++target_names%0D%0A++++from%0D%0A++++++ads_with_targets%0D%0A++++where%0D%0A++++++%3Ap0+in+%28%0D%0A++++++++select%0D%0A++++++++++value%0D%0A++++++++from%0D%0A++++++++++json_each%28%5Bads_with_targets%5D.%5Btarget_names%5D%29%0D%0A++++++%29%0D%0A++%29%0D%0A++join+json_each%28target_names%29+j%0D%0Agroup+by%0D%0A++j.value%0D%0Aorder+by%0D%0A++count+desc%2C%0D%0A++value%0D%0Alimit%0D%0A++31&p0=people_who_match%3Ainterests%3AAfrican-American+culture\r\n\r\n```sql\r\nselect\r\n j.value as value,\r\n count(*) as count\r\nfrom\r\n (\r\n select\r\n id,\r\n file,\r\n clicks,\r\n impressions,\r\n text,\r\n url,\r\n spend_amount,\r\n spend_currency,\r\n created,\r\n ended,\r\n target_names\r\n from\r\n ads_with_targets\r\n where\r\n :p0 in (\r\n select\r\n value\r\n from\r\n json_each([ads_with_targets].[target_names])\r\n )\r\n )\r\n join json_each(target_names) j\r\ngroup by\r\n j.value\r\norder by\r\n count desc,\r\n value\r\nlimit\r\n 31\r\n```\r\nHow can I return a count of the number of documents containing each tag, but not the number of total tags that match including duplicates?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 440222719, "label": "_facet_array should work against views"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/448#issuecomment-969557008", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/448", "id": 969557008, "node_id": "IC_kwDOBm6k_c45ykQQ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-11-16T00:56:09Z", "updated_at": "2021-11-16T00:59:59Z", "author_association": "OWNER", "body": "This looks like it might work:\r\n```sql\r\nwith inner as (\r\n select\r\n *\r\n from\r\n ads_with_targets\r\n where\r\n :p0 in (\r\n select\r\n value\r\n from\r\n json_each([ads_with_targets].[target_names])\r\n )\r\n),\r\ndeduped_array_items as (\r\n select\r\n distinct j.value,\r\n inner.*\r\n from\r\n json_each([inner].[target_names]) j\r\n join inner\r\n)\r\nselect\r\n value,\r\n count(*)\r\nfrom\r\n deduped_array_items\r\ngroup by\r\n value\r\norder by\r\n count(*) desc\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 440222719, "label": "_facet_array should work against views"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/448#issuecomment-969557972", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/448", "id": 969557972, "node_id": "IC_kwDOBm6k_c45ykfU", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-11-16T00:56:58Z", "updated_at": "2021-11-16T00:56:58Z", "author_association": "OWNER", "body": "It uses a CTE which were introduced in SQLite 3.8 - and AWS Lambda Python 3.9 still provides 3.7 - but I've checked and I can use `pysqlite3-binary` to work around that there so I'm OK relying on CTEs for this.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 440222719, "label": "_facet_array should work against views"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/448#issuecomment-969572281", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/448", "id": 969572281, "node_id": "IC_kwDOBm6k_c45yn-5", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-11-16T01:05:11Z", "updated_at": "2021-11-16T01:05:11Z", "author_association": "OWNER", "body": "I tried this and it seems to work correctly:\r\n```python\r\n for source_and_config in self.get_configs():\r\n config = source_and_config[\"config\"]\r\n source = source_and_config[\"source\"]\r\n column = config.get(\"column\") or config[\"simple\"]\r\n facet_sql = \"\"\"\r\n with inner as ({sql}),\r\n deduped_array_items as (\r\n select\r\n distinct j.value,\r\n inner.*\r\n from\r\n json_each([inner].{col}) j\r\n join inner\r\n )\r\n select\r\n value as value,\r\n count(*) as count\r\n from\r\n deduped_array_items\r\n group by\r\n value\r\n order by\r\n count(*) desc limit {limit}\r\n \"\"\".format(\r\n col=escape_sqlite(column), sql=self.sql, limit=facet_size + 1\r\n )\r\n```\r\nThe queries are _very_ slow though - I had to bump up to 2s time limit even against only a view returning 3,499 rows.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 440222719, "label": "_facet_array should work against views"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/448#issuecomment-969578466", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/448", "id": 969578466, "node_id": "IC_kwDOBm6k_c45ypfi", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-11-16T01:08:29Z", "updated_at": "2021-11-16T01:08:29Z", "author_association": "OWNER", "body": "Actually with the cache warmed up it looks like the facet query is taking 150ms which is good enough.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 440222719, "label": "_facet_array should work against views"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/448#issuecomment-969582098", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/448", "id": 969582098, "node_id": "IC_kwDOBm6k_c45yqYS", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-11-16T01:10:28Z", "updated_at": "2021-11-16T01:10:28Z", "author_association": "OWNER", "body": "Also note that this demo data is using a SQL view to create the JSON arrays - the view is defined as such:\r\n\r\n```sql\r\nCREATE VIEW ads_with_targets as\r\nselect\r\n ads.*,\r\n json_group_array(targets.name) as target_names\r\nfrom\r\n ads\r\n join ad_targets on ad_targets.ad_id = ads.id\r\n join targets on ad_targets.target_id = targets.id\r\ngroup by\r\n ad_targets.ad_id;\r\n```\r\nSo running JSON faceting on top of that view is a pretty big ask!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 440222719, "label": "_facet_array should work against views"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/448#issuecomment-969621662", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/448", "id": 969621662, "node_id": "IC_kwDOBm6k_c45y0Ce", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-11-16T01:32:04Z", "updated_at": "2021-11-16T01:32:04Z", "author_association": "OWNER", "body": "Tests are failing and I think it's because the facets come back in different orders, need a tie-breaker. https://github.com/simonw/datasette/runs/4219325197?check_suite_focus=true", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 440222719, "label": "_facet_array should work against views"}, "performed_via_github_app": null}