{"html_url": "https://github.com/simonw/sqlite-utils/issues/297#issuecomment-882052693", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/297", "id": 882052693, "node_id": "IC_kwDOCGYnMM40kw5V", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-07-18T12:57:54Z", "updated_at": "2022-06-21T13:17:15Z", "author_association": "OWNER", "body": "Another implementation option would be to use the CSV virtual table mechanism. This could avoid shelling out to the `sqlite3` binary, but requires solving the harder problem of compiling and distributing a loadable SQLite module: https://www.sqlite.org/csv.html\r\n\r\n(Would be neat to produce a Python wheel of this, see https://simonwillison.net/2022/May/23/bundling-binary-tools-in-python-wheels/)\r\n\r\nThis would also help solve the challenge of making this optimization available to the `sqlite-utils memory` command. That command operates against an in-memory database so it's not obvious how it could shell out to a binary.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 944846776, "label": "Option for importing CSV data using the SQLite .import mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1162223668", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/297", "id": 1162223668, "node_id": "IC_kwDOCGYnMM5FRiA0", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-21T19:19:22Z", "updated_at": "2022-06-21T19:22:15Z", "author_association": "OWNER", "body": "Built a prototype of `--fast` for the `sqlite-utils memory` command:\r\n\r\n```\r\n% time sqlite-utils memory taxi.csv 'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count' --fast\r\npassenger_count COUNT(*) AVG(total_amount)\r\n--------------- -------- -----------------\r\n 128020 32.2371511482553 \r\n0 42228 17.0214016766151 \r\n1 1533197 17.6418833067999 \r\n2 286461 18.0975870711456 \r\n3 72852 17.9153958710923 \r\n4 25510 18.452774990196 \r\n5 50291 17.2709248175672 \r\n6 32623 17.6002964166367 \r\n7 2 87.17 \r\n8 2 95.705 \r\n9 1 113.6 \r\nsqlite-utils memory taxi.csv --fast 12.71s user 0.48s system 104% cpu 12.627 total\r\n```\r\nTakes 13s - about the same time as calling `sqlite3 :memory: ...` directly as seen in https://til.simonwillison.net/sqlite/one-line-csv-operations\r\n\r\nWithout the `--fast` option that takes several minutes (262s = 4m20s)!\r\n\r\nHere's the prototype so far:\r\n\r\n```diff\r\ndiff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py\r\nindex 86eddfb..1c83ef6 100644\r\n--- a/sqlite_utils/cli.py\r\n+++ b/sqlite_utils/cli.py\r\n@@ -14,6 +14,8 @@ import io\r\n import itertools\r\n import json\r\n import os\r\n+import shutil\r\n+import subprocess\r\n import sys\r\n import csv as csv_std\r\n import tabulate\r\n@@ -1669,6 +1671,7 @@ def query(\r\n is_flag=True,\r\n help=\"Analyze resulting tables and output results\",\r\n )\r\n+@click.option(\"--fast\", is_flag=True, help=\"Fast mode, only works with CSV and TSV\")\r\n @load_extension_option\r\n def memory(\r\n paths,\r\n@@ -1692,6 +1695,7 @@ def memory(\r\n save,\r\n analyze,\r\n load_extension,\r\n+ fast,\r\n ):\r\n \"\"\"Execute SQL query against an in-memory database, optionally populated by imported data\r\n \r\n@@ -1719,6 +1723,22 @@ def memory(\r\n \\b\r\n sqlite-utils memory animals.csv --schema\r\n \"\"\"\r\n+ if fast:\r\n+ if (\r\n+ attach\r\n+ or flatten\r\n+ or param\r\n+ or encoding\r\n+ or no_detect_types\r\n+ or analyze\r\n+ or load_extension\r\n+ ):\r\n+ raise click.ClickException(\r\n+ \"--fast mode does not support any of the following options: --attach, --flatten, --param, --encoding, --no-detect-types, --analyze, --load-extension\"\r\n+ )\r\n+ # TODO: Figure out and pass other supported options\r\n+ memory_fast(paths, sql)\r\n+ return\r\n db = sqlite_utils.Database(memory=True)\r\n # If --dump or --save or --analyze used but no paths detected, assume SQL query is a path:\r\n if (dump or save or schema or analyze) and not paths:\r\n@@ -1791,6 +1811,33 @@ def memory(\r\n )\r\n \r\n \r\n+def memory_fast(paths, sql):\r\n+ if not shutil.which(\"sqlite3\"):\r\n+ raise click.ClickException(\"sqlite3 not found in PATH\")\r\n+ args = [\"sqlite3\", \":memory:\", \"-cmd\", \".mode csv\"]\r\n+ table_names = []\r\n+\r\n+ def name(path):\r\n+ base_name = pathlib.Path(path).stem or \"t\"\r\n+ table_name = base_name\r\n+ prefix = 1\r\n+ while table_name in table_names:\r\n+ prefix += 1\r\n+ table_name = \"{}_{}\".format(base_name, prefix)\r\n+ return table_name\r\n+\r\n+ for path in paths:\r\n+ table_name = name(path)\r\n+ table_names.append(table_name)\r\n+ args.extend(\r\n+ [\"-cmd\", \".import {} {}\".format(pathlib.Path(path).resolve(), table_name)]\r\n+ )\r\n+\r\n+ args.extend([\"-cmd\", \".mode column\"])\r\n+ args.append(sql)\r\n+ subprocess.run(args)\r\n+\r\n+\r\n def _execute_query(\r\n db, sql, param, raw, table, csv, tsv, no_headers, fmt, nl, arrays, json_cols\r\n ):\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 944846776, "label": "Option for importing CSV data using the SQLite .import mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/447#issuecomment-1161869859", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/447", "id": 1161869859, "node_id": "IC_kwDOCGYnMM5FQLoj", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-21T15:00:42Z", "updated_at": "2022-06-21T15:00:42Z", "author_association": "OWNER", "body": "Deploying that to https://sqlite-utils.datasette.io/en/latest/cli-reference.html#insert", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1278571700, "label": "Incorrect syntax highlighting in docs CLI reference"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1162231111", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/297", "id": 1162231111, "node_id": "IC_kwDOCGYnMM5FRj1H", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-21T19:25:44Z", "updated_at": "2022-06-21T19:25:44Z", "author_association": "OWNER", "body": "Pushed that prototype to a branch.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 944846776, "label": "Option for importing CSV data using the SQLite .import mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1160991031", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/297", "id": 1160991031, "node_id": "IC_kwDOCGYnMM5FM1E3", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-21T00:35:20Z", "updated_at": "2022-06-21T00:35:20Z", "author_association": "OWNER", "body": "Relevant TIL: https://til.simonwillison.net/sqlite/one-line-csv-operations", "reactions": "{\"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 944846776, "label": "Option for importing CSV data using the SQLite .import mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1161849874", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/297", "id": 1161849874, "node_id": "IC_kwDOCGYnMM5FQGwS", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-21T14:49:12Z", "updated_at": "2022-06-21T14:49:12Z", "author_association": "OWNER", "body": "Since there are all sorts of existing options for `sqlite-utils insert` that won't work with this, maybe it would be better to have an entirely separate command - this for example:\r\n\r\n sqlite-utils fast-insert data.db mytable data.csv ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 944846776, "label": "Option for importing CSV data using the SQLite .import mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/447#issuecomment-1162186856", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/447", "id": 1162186856, "node_id": "IC_kwDOCGYnMM5FRZBo", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-21T18:48:46Z", "updated_at": "2022-06-21T18:48:46Z", "author_association": "OWNER", "body": "That fixed it:\r\n\r\n\"image\"\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1278571700, "label": "Incorrect syntax highlighting in docs CLI reference"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1162179354", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/297", "id": 1162179354, "node_id": "IC_kwDOCGYnMM5FRXMa", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-21T18:44:03Z", "updated_at": "2022-06-21T18:44:03Z", "author_association": "OWNER", "body": "The thing I like about that `--fast` option is that it could selectively use this alternative mechanism just for the files for which it can work (CSV and TSV files). I could also add a `--fast` option to `sqlite-utils memory` which could then kick in only for operations that involve just TSV and CSV files.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 944846776, "label": "Option for importing CSV data using the SQLite .import mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/446#issuecomment-1162234441", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/446", "id": 1162234441, "node_id": "IC_kwDOCGYnMM5FRkpJ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-21T19:28:35Z", "updated_at": "2022-06-21T19:28:35Z", "author_association": "OWNER", "body": "`just -l` now does this:\r\n\r\n```\r\n% just -l\r\nAvailable recipes:\r\n black # Apply Black\r\n cog # Rebuild docs with cog\r\n default # Run tests and linters\r\n lint # Run linters: black, flake8, mypy, cog\r\n test *options # Run pytest with supplied options\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1277328147, "label": "Use Just to automate running tests and linters locally"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/447#issuecomment-1161857806", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/447", "id": 1161857806, "node_id": "IC_kwDOCGYnMM5FQIsO", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-21T14:55:51Z", "updated_at": "2022-06-21T14:58:14Z", "author_association": "OWNER", "body": "https://stackoverflow.com/a/44379513 suggests that the fix is:\r\n\r\n .. code-block:: text\r\n\r\nOr set this in `conf.py`:\r\n\r\n highlight_language = \"none\"\r\n\r\nI like that better - I don't like that all `::` blocks default to being treated as Python code.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1278571700, "label": "Incorrect syntax highlighting in docs CLI reference"}, "performed_via_github_app": null}