html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-880256058,https://api.github.com/repos/simonw/sqlite-utils/issues/297,880256058,MDEyOklzc3VlQ29tbWVudDg4MDI1NjA1OA==,9599,2021-07-14T22:40:01Z,2021-07-14T22:40:47Z,OWNER,"Full docs here: https://www.sqlite.org/draft/cli.html#csv
One catch: how this works has changed in recent SQLite versions: https://www.sqlite.org/changes.html
- 2020-12-01 (3.34.0) - ""Table name quoting works correctly for the .import dot-command""
- 2020-05-22 (3.32.0) - ""Add options to the .import command: --csv, --ascii, --skip""
- 2017-08-01 (3.20.0) - "" The "".import"" command ignores an initial UTF-8 BOM.""
The ""skip"" feature is particularly important to understand. https://www.sqlite.org/draft/cli.html#csv says:
> There are two cases to consider: (1) Table ""tab1"" does not previously exist and (2) table ""tab1"" does already exist.
>
> In the first case, when the table does not previously exist, the table is automatically created and the content of the first row of the input CSV file is used to determine the name of all the columns in the table. In other words, if the table does not previously exist, the first row of the CSV file is interpreted to be column names and the actual data starts on the second row of the CSV file.
>
> For the second case, when the table already exists, every row of the CSV file, including the first row, is assumed to be actual content. If the CSV file contains an initial row of column labels, you can cause the .import command to skip that initial row using the ""--skip 1"" option.
But the `--skip 1` option is only available in 3.32.0 and higher.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",944846776,
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-880256865,https://api.github.com/repos/simonw/sqlite-utils/issues/297,880256865,MDEyOklzc3VlQ29tbWVudDg4MDI1Njg2NQ==,9599,2021-07-14T22:42:11Z,2021-07-14T22:42:11Z,OWNER,"Potential workaround for missing `--skip` implementation is that the filename can be a command instead, so maybe it could shell out to `tail -n +1 filename`:
> The source argument is the name of a file to be read or, if it begins with a ""|"" character, specifies a command which will be run to produce the input CSV data.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",944846776,
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-880257587,https://api.github.com/repos/simonw/sqlite-utils/issues/297,880257587,MDEyOklzc3VlQ29tbWVudDg4MDI1NzU4Nw==,9599,2021-07-14T22:44:05Z,2021-07-14T22:44:05Z,OWNER,"https://unix.stackexchange.com/a/642364 suggests you can also use this to import from stdin, like so:
sqlite3 -csv $database_file_name "".import '|cat -' $table_name""
Here the `sqlite3 -csv` is an alternative to using `.mode csv`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",944846776,
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-880259255,https://api.github.com/repos/simonw/sqlite-utils/issues/297,880259255,MDEyOklzc3VlQ29tbWVudDg4MDI1OTI1NQ==,9599,2021-07-14T22:48:41Z,2021-07-14T22:48:41Z,OWNER,Should also take advantage of `.mode tabs` to support `sqlite-utils insert blah.db blah blah.csv --tsv --fast`,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",944846776,
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-882052693,https://api.github.com/repos/simonw/sqlite-utils/issues/297,882052693,IC_kwDOCGYnMM40kw5V,9599,2021-07-18T12:57:54Z,2022-06-21T13:17:15Z,OWNER,"Another implementation option would be to use the CSV virtual table mechanism. This could avoid shelling out to the `sqlite3` binary, but requires solving the harder problem of compiling and distributing a loadable SQLite module: https://www.sqlite.org/csv.html
(Would be neat to produce a Python wheel of this, see https://simonwillison.net/2022/May/23/bundling-binary-tools-in-python-wheels/)
This would also help solve the challenge of making this optimization available to the `sqlite-utils memory` command. That command operates against an in-memory database so it's not obvious how it could shell out to a binary.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",944846776,
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-882052852,https://api.github.com/repos/simonw/sqlite-utils/issues/297,882052852,IC_kwDOCGYnMM40kw70,9599,2021-07-18T12:59:20Z,2021-07-18T12:59:20Z,OWNER,I'm not too worried about `sqlite-utils memory` because if your data is large enough that you can benefit from this optimization you probably should use a real file as opposed to a disposable memory database when analyzing it.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",944846776,
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1160991031,https://api.github.com/repos/simonw/sqlite-utils/issues/297,1160991031,IC_kwDOCGYnMM5FM1E3,9599,2022-06-21T00:35:20Z,2022-06-21T00:35:20Z,OWNER,Relevant TIL: https://til.simonwillison.net/sqlite/one-line-csv-operations,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",944846776,
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1161849874,https://api.github.com/repos/simonw/sqlite-utils/issues/297,1161849874,IC_kwDOCGYnMM5FQGwS,9599,2022-06-21T14:49:12Z,2022-06-21T14:49:12Z,OWNER,"Since there are all sorts of existing options for `sqlite-utils insert` that won't work with this, maybe it would be better to have an entirely separate command - this for example:
sqlite-utils fast-insert data.db mytable data.csv ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",944846776,
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1162179354,https://api.github.com/repos/simonw/sqlite-utils/issues/297,1162179354,IC_kwDOCGYnMM5FRXMa,9599,2022-06-21T18:44:03Z,2022-06-21T18:44:03Z,OWNER,The thing I like about that `--fast` option is that it could selectively use this alternative mechanism just for the files for which it can work (CSV and TSV files). I could also add a `--fast` option to `sqlite-utils memory` which could then kick in only for operations that involve just TSV and CSV files.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",944846776,
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1162223668,https://api.github.com/repos/simonw/sqlite-utils/issues/297,1162223668,IC_kwDOCGYnMM5FRiA0,9599,2022-06-21T19:19:22Z,2022-06-21T19:22:15Z,OWNER,"Built a prototype of `--fast` for the `sqlite-utils memory` command:
```
% time sqlite-utils memory taxi.csv 'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count' --fast
passenger_count COUNT(*) AVG(total_amount)
--------------- -------- -----------------
128020 32.2371511482553
0 42228 17.0214016766151
1 1533197 17.6418833067999
2 286461 18.0975870711456
3 72852 17.9153958710923
4 25510 18.452774990196
5 50291 17.2709248175672
6 32623 17.6002964166367
7 2 87.17
8 2 95.705
9 1 113.6
sqlite-utils memory taxi.csv --fast 12.71s user 0.48s system 104% cpu 12.627 total
```
Takes 13s - about the same time as calling `sqlite3 :memory: ...` directly as seen in https://til.simonwillison.net/sqlite/one-line-csv-operations
Without the `--fast` option that takes several minutes (262s = 4m20s)!
Here's the prototype so far:
```diff
diff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py
index 86eddfb..1c83ef6 100644
--- a/sqlite_utils/cli.py
+++ b/sqlite_utils/cli.py
@@ -14,6 +14,8 @@ import io
import itertools
import json
import os
+import shutil
+import subprocess
import sys
import csv as csv_std
import tabulate
@@ -1669,6 +1671,7 @@ def query(
is_flag=True,
help=""Analyze resulting tables and output results"",
)
+@click.option(""--fast"", is_flag=True, help=""Fast mode, only works with CSV and TSV"")
@load_extension_option
def memory(
paths,
@@ -1692,6 +1695,7 @@ def memory(
save,
analyze,
load_extension,
+ fast,
):
""""""Execute SQL query against an in-memory database, optionally populated by imported data
@@ -1719,6 +1723,22 @@ def memory(
\b
sqlite-utils memory animals.csv --schema
""""""
+ if fast:
+ if (
+ attach
+ or flatten
+ or param
+ or encoding
+ or no_detect_types
+ or analyze
+ or load_extension
+ ):
+ raise click.ClickException(
+ ""--fast mode does not support any of the following options: --attach, --flatten, --param, --encoding, --no-detect-types, --analyze, --load-extension""
+ )
+ # TODO: Figure out and pass other supported options
+ memory_fast(paths, sql)
+ return
db = sqlite_utils.Database(memory=True)
# If --dump or --save or --analyze used but no paths detected, assume SQL query is a path:
if (dump or save or schema or analyze) and not paths:
@@ -1791,6 +1811,33 @@ def memory(
)
+def memory_fast(paths, sql):
+ if not shutil.which(""sqlite3""):
+ raise click.ClickException(""sqlite3 not found in PATH"")
+ args = [""sqlite3"", "":memory:"", ""-cmd"", "".mode csv""]
+ table_names = []
+
+ def name(path):
+ base_name = pathlib.Path(path).stem or ""t""
+ table_name = base_name
+ prefix = 1
+ while table_name in table_names:
+ prefix += 1
+ table_name = ""{}_{}"".format(base_name, prefix)
+ return table_name
+
+ for path in paths:
+ table_name = name(path)
+ table_names.append(table_name)
+ args.extend(
+ [""-cmd"", "".import {} {}"".format(pathlib.Path(path).resolve(), table_name)]
+ )
+
+ args.extend([""-cmd"", "".mode column""])
+ args.append(sql)
+ subprocess.run(args)
+
+
def _execute_query(
db, sql, param, raw, table, csv, tsv, no_headers, fmt, nl, arrays, json_cols
):
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",944846776,
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1162231111,https://api.github.com/repos/simonw/sqlite-utils/issues/297,1162231111,IC_kwDOCGYnMM5FRj1H,9599,2022-06-21T19:25:44Z,2022-06-21T19:25:44Z,OWNER,Pushed that prototype to a branch.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",944846776,
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1240882245,https://api.github.com/repos/simonw/sqlite-utils/issues/297,1240882245,IC_kwDOCGYnMM5J9lxF,9599,2022-09-08T15:33:11Z,2022-09-08T15:33:11Z,OWNER,"The more I think about this the more I like it - particularly for `sqlite-utils fast-insert` where differences in features aren't a problem.
I used a variant of this trick with parquet files here: https://simonwillison.net/2022/Sep/5/laion-aesthetics-weeknotes/","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",944846776,
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1246977989,https://api.github.com/repos/simonw/sqlite-utils/issues/297,1246977989,IC_kwDOCGYnMM5KU1_F,9599,2022-09-14T15:57:09Z,2022-09-14T15:57:09Z,OWNER,"Should consider how this could best handle creating columns that are integer and float as opposed to just text.
https://discord.com/channels/823971286308356157/823971286941302908/1019630014544748584 is a relevant discussion on Discord. Even if you create the schema in advance with the correct column types, this import mechanism can put empty strings in blank float/integer columns when ideally you would want to have nulls.
Related feature idea for `sqlite-utils transform`:
- #488
Not sure how best to handle this for `sqlite3 .import` imports.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",944846776,
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1246978641,https://api.github.com/repos/simonw/sqlite-utils/issues/297,1246978641,IC_kwDOCGYnMM5KU2JR,9599,2022-09-14T15:57:41Z,2022-09-14T15:57:41Z,OWNER,"One solution suggested on Discord:
```
wget https://static.openfoodfacts.org/data/en.openfoodfacts.org.products.csv
CREATE=`curl -s -L https://gist.githubusercontent.com/CharlesNepote/80fb813a416ad445fdd6e4738b4c8156/raw/032af70de631ff1c4dd09d55360f242949dcc24f/create.sql`
INDEX=`curl -s -L https://gist.githubusercontent.com/CharlesNepote/80fb813a416ad445fdd6e4738b4c8156/raw/032af70de631ff1c4dd09d55360f242949dcc24f/index.sql`
time sqlite3 products_new.db <