{"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513244121", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 513244121, "node_id": "MDEyOklzc3VlQ29tbWVudDUxMzI0NDEyMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-07-19T14:13:33Z", "updated_at": "2019-07-19T14:13:33Z", "author_association": "OWNER", "body": "So what could the interface to this look like? Especially for the CLI?\r\n\r\nOne option:\r\n\r\n sqlite-utils extract dea_sales company_name companies name\r\n\r\nTricky thing here is that it's quite a large number of positional arguments:\r\n\r\n sqlite-utils extract dea_sales company_name companies name\r\n Table column New table New column (maybe optional?)\r\n\r\nIt would be great if this could supported multiple columns - for if a spreadsheet has e.g. a \u201cCompany Name\u201d, \u201cCompany Address\u201d pair of fields that always match each other and areduplicated many times.\r\n\r\nThis could be handled by creating the new table with two columns that are indexed as a unique compound key. Then you can easily get-or-create on the pairs (or triples or whatever) from the original table.\r\n\r\nChallenge here is what does the CLI syntax look like. Something like this?\r\n\r\n $ sqlite-utils extract dea_sales -c company_name -c company_address \\\r\n --to companies --to-col name --to-col address\r\n\r\nPerhaps the columns in the new table are FORCED to be the same as the old ones, hence avoiding some options? Bit restrictive\u2026 maybe they default to the same but you can customize?\r\n\r\n $ sqlite-utils extract dea_sales -c company_name -c company_address -t companies", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513246124", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 513246124, "node_id": "MDEyOklzc3VlQ29tbWVudDUxMzI0NjEyNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-07-19T14:18:35Z", "updated_at": "2019-07-19T14:19:40Z", "author_association": "OWNER", "body": "How about the Python version? That should be easier to design.\r\n\r\n```python\r\ndb[\"dea_sales\"].extract(\r\n columns=[\"company_name\", \"company_address\"],\r\n to_table=\"companies\"\r\n)\r\n```\r\nIf we want to transform the extracted data (e.g. rename those columns) maybe support a `transform=` argument?\r\n\r\n```python\r\ndb[\"dea_sales\"].extract(\r\n columns=[\"company_name\", \"company_address\"],\r\n to_table=\"companies\",\r\n transform = lambda extracted: {\r\n \"name\": extracted[\"company_name\"],\r\n \"address\": extracted[\"company_address\"],\r\n }\r\n)\r\n```\r\nThis would create a new \"companies\" table with three columns: id, name and address.\r\n\r\nWould also be nice if there was a syntax for saying \"... and use the value from this column as the primary key column in the newly created table\".", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513246831", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 513246831, "node_id": "MDEyOklzc3VlQ29tbWVudDUxMzI0NjgzMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-07-19T14:20:15Z", "updated_at": "2019-07-19T14:20:49Z", "author_association": "OWNER", "body": "Since these operations could take a long time against large tables, it would be neat if there was a progress bar option for the CLI command.\r\n\r\nThe operations are full table scans so calculating progress shouldn't be too difficult.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513262013", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 513262013, "node_id": "MDEyOklzc3VlQ29tbWVudDUxMzI2MjAxMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-07-19T14:58:23Z", "updated_at": "2020-09-22T18:12:11Z", "author_association": "OWNER", "body": "CLI design idea:\r\n\r\n $ sqlite-utils extract my.db \\\r\n dea_sales company_name\r\n\r\nHere we just specify the original table and column - the new extracted table will automatically be called \"company_name\" and will have \"id\" and \"value\" columns, by default.\r\n\r\nTo set a custom extract table:\r\n\r\n $ sqlite-utils extract my.db \\\r\n dea_sales company_name \\\r\n --table companies\r\n\r\nAnd for extracting multiple columns and renaming them on the created table, maybe something like this:\r\n\r\n $ sqlite-utils extract my.db \\\r\n dea_sales company_name company_address \\\r\n --table companies \\\r\n --column company_name name \\\r\n --column company_address address\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/537#issuecomment-513272392", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/537", "id": 513272392, "node_id": "MDEyOklzc3VlQ29tbWVudDUxMzI3MjM5Mg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-07-19T15:27:03Z", "updated_at": "2019-07-19T15:27:03Z", "author_association": "OWNER", "body": "Yeah that's a good call: the Datasette plugin mechanism where middleware is wrapped around the outside doesn't appear to be compatible with the Sentry mechanism of expecting that `scope` has been populated before it gets to their error handler.\r\n\r\n@tomchristie is this something you've thought about?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 463544206, "label": "Populate \"endpoint\" key in ASGI scope"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/537#issuecomment-513273003", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/537", "id": 513273003, "node_id": "MDEyOklzc3VlQ29tbWVudDUxMzI3MzAwMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-07-19T15:28:42Z", "updated_at": "2019-07-19T15:28:42Z", "author_association": "OWNER", "body": "Asked about this on Twitter: https://twitter.com/simonw/status/1152238730259791877", "reactions": "{\"total_count\": 1, \"+1\": 0, \"-1\": 0, \"laugh\": 1, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 463544206, "label": "Populate \"endpoint\" key in ASGI scope"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/537#issuecomment-513279397", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/537", "id": 513279397, "node_id": "MDEyOklzc3VlQ29tbWVudDUxMzI3OTM5Nw==", "user": {"value": 647359, "label": "tomchristie"}, "created_at": "2019-07-19T15:47:57Z", "updated_at": "2019-07-19T15:48:09Z", "author_association": "NONE", "body": "The middleware implementation there works okay with a router nested inside if the scope is *mutated*. (Ie. \"endpoint\" doesn't need to exist at the point that the middleware starts running, but if it *has* been made available by the time an exception is thrown, then it can be used.)\r\n\r\nStarlette's usage of \"endpoint\" there is unilateral, rather than something I've discussed against the ASGI spec - certainly it's important for any monitoring ASGI middleware to be able to have some kind of visibility onto some limited subset of routing information, and `\"endpoint\"` in the scope referencing some routed-to callable seemed general enough to be useful.\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 463544206, "label": "Populate \"endpoint\" key in ASGI scope"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/537#issuecomment-513307487", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/537", "id": 513307487, "node_id": "MDEyOklzc3VlQ29tbWVudDUxMzMwNzQ4Nw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-07-19T17:17:43Z", "updated_at": "2019-07-19T17:17:43Z", "author_association": "OWNER", "body": "Huh, interesting. I'd got it into my head that scope should not be mutated under any circumstances - if that's not true and it's mutable there's all kinds of useful things we could do with it.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 463544206, "label": "Populate \"endpoint\" key in ASGI scope"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/537#issuecomment-513317952", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/537", "id": 513317952, "node_id": "MDEyOklzc3VlQ29tbWVudDUxMzMxNzk1Mg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-07-19T17:49:06Z", "updated_at": "2019-07-19T17:49:06Z", "author_association": "OWNER", "body": "It strikes me that if scope is indeed meant to stay immutable the alternative way of solving this would be to add an outbound custom request header with the endpoint - `X-Endpoint: datasette.views.table.TableView` for example - and teach the Sentry plugin to optionally read that.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 463544206, "label": "Populate \"endpoint\" key in ASGI scope"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/562#issuecomment-513373673", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/562", "id": 513373673, "node_id": "MDEyOklzc3VlQ29tbWVudDUxMzM3MzY3Mw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-07-19T20:52:04Z", "updated_at": "2019-07-19T20:52:04Z", "author_association": "OWNER", "body": "I'll do this as part of #551 ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470542938, "label": "Facet by array shouldn't suggest for arrays that are not arrays-of-strings"}, "performed_via_github_app": null}