github: issue_comments: 39 rows where "updated

39 rows where "updated_at" is on date 2022-06-14

Search:

✖

descending

id ▼	html_url	issue_url	node_id	user	created_at	updated_at	author_association	body	reactions	issue
1155310521	https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155310521	https://api.github.com/repos/simonw/sqlite-utils/issues/440	IC_kwDOCGYnMM5E3KO5	simonw 9599	2022-06-14T14:58:50Z	2022-06-14T14:58:50Z	OWNER	Interesting challenge in writing tests for this: if you give `csv.Sniffer` a short example with an invalid row in it sometimes it picks the wrong delimiter! id,name\r\n1,Cleo,oops It decided the delimiter there was `e`.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	CSV files with too many values in a row cause errors 1250629388
1155317293	https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155317293	https://api.github.com/repos/simonw/sqlite-utils/issues/440	IC_kwDOCGYnMM5E3L4t	simonw 9599	2022-06-14T15:04:01Z	2022-06-14T15:04:01Z	OWNER	I think that's unavoidable: it looks like `csv.Sniffer` only works if you feed it a CSV file with an equal number of values in each row, which is understandable.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	CSV files with too many values in a row cause errors 1250629388
1155350755	https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155350755	https://api.github.com/repos/simonw/sqlite-utils/issues/440	IC_kwDOCGYnMM5E3UDj	simonw 9599	2022-06-14T15:25:18Z	2022-06-14T15:25:18Z	OWNER	That broke `mypy`: `sqlite_utils/utils.py:229: error: Incompatible types in assignment (expression has type "Iterable[Dict[Any, Any]]", variable has type "DictReader[str]")`	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	CSV files with too many values in a row cause errors 1250629388
1155358637	https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155358637	https://api.github.com/repos/simonw/sqlite-utils/issues/440	IC_kwDOCGYnMM5E3V-t	simonw 9599	2022-06-14T15:31:34Z	2022-06-14T15:31:34Z	OWNER	Getting this past `mypy` is really hard! ``` % mypy sqlite_utils sqlite_utils/utils.py:189: error: No overload variant of "pop" of "MutableMapping" matches argument type "None" sqlite_utils/utils.py:189: note: Possible overload variants: sqlite_utils/utils.py:189: note: def pop(self, key: str) -> str sqlite_utils/utils.py:189: note: def [_T] pop(self, key: str, default: Union[str, _T] = ...) -> Union[str, _T] ``` That's because of this line: row.pop(key=None) Which is legit here - we have a dictionary where one of the keys is `None` and we want to remove that key. But the baked in type is apparently `def pop(self, key: str) -> str`.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	CSV files with too many values in a row cause errors 1250629388
1155364367	https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1155364367	https://api.github.com/repos/simonw/sqlite-utils/issues/412	IC_kwDOCGYnMM5E3XYP	simonw 9599	2022-06-14T15:36:28Z	2022-06-14T15:36:28Z	OWNER	Here's as far as I got with my initial prototype, in `sqlite_utils/pandas.py`: ```python from .db import Database as _Database, Table as _Table, View as _View import pandas as pd from typing import ( Iterable, Union, Optional, ) class Database(_Database): def query( self, sql: str, params: Optional[Union[Iterable, dict]] = None ) -> pd.DataFrame: return pd.DataFrame(super().query(sql, params)) def table(self, table_name: str, kwargs) -> Union["Table", "View"]: "Return a table object, optionally configured with default options." klass = View if table_name in self.view_names() else Table return klass(self, table_name, kwargs) class PandasQueryable: def rows_where( self, where: str = None, where_args: Optional[Union[Iterable, dict]] = None, order_by: str = None, select: str = "*", limit: int = None, offset: int = None, ) -> pd.DataFrame: return pd.DataFrame( super().rows_where( where, where_args, order_by=order_by, select=select, limit=limit, offset=offset, ) ) class Table(PandasQueryable, _Table): pass class View(PandasQueryable, _View): pass ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Optional Pandas integration 1160182768
1155389614	https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155389614	https://api.github.com/repos/simonw/sqlite-utils/issues/440	IC_kwDOCGYnMM5E3diu	simonw 9599	2022-06-14T15:54:03Z	2022-06-14T15:54:03Z	OWNER	Filed an issue against `python/typeshed`: - https://github.com/python/typeshed/issues/8075	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	CSV files with too many values in a row cause errors 1250629388
1155421299	https://github.com/simonw/sqlite-utils/issues/441#issuecomment-1155421299	https://api.github.com/repos/simonw/sqlite-utils/issues/441	IC_kwDOCGYnMM5E3lRz	simonw 9599	2022-06-14T16:23:52Z	2022-06-14T16:23:52Z	OWNER	Actually I have a thought for something that could help here: I could add a mechanism for inserting additional where filters and parameters into that `.search()` method.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Combining `rows_where()` and `search()` to limit which rows are searched 1257724585
1155515426	https://github.com/simonw/sqlite-utils/issues/441#issuecomment-1155515426	https://api.github.com/repos/simonw/sqlite-utils/issues/441	IC_kwDOCGYnMM5E38Qi	betatim 1448859	2022-06-14T17:53:43Z	2022-06-14T17:53:43Z	NONE	That would be handy (additional where filters) but I think the trick with the `with` statement is already an order of magnitude better than what I had thought of, so my problem is solved by it (plus I got to learn about `with` today!)	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Combining `rows_where()` and `search()` to limit which rows are searched 1257724585
1155666672	https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155666672	https://api.github.com/repos/simonw/sqlite-utils/issues/440	IC_kwDOCGYnMM5E4hLw	simonw 9599	2022-06-14T20:11:52Z	2022-06-14T20:11:52Z	OWNER	I'm going to rename `restkey` to `extras_key` for consistency with `ignore_extras`.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	CSV files with too many values in a row cause errors 1250629388
1155672522	https://github.com/simonw/sqlite-utils/issues/443#issuecomment-1155672522	https://api.github.com/repos/simonw/sqlite-utils/issues/443	IC_kwDOCGYnMM5E4inK	simonw 9599	2022-06-14T20:18:58Z	2022-06-14T20:18:58Z	OWNER	New documentation: https://sqlite-utils.datasette.io/en/latest/python-api.html#reading-rows-from-a-file	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Make `utils.rows_from_file()` a documented API 1269998342
1155672675	https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155672675	https://api.github.com/repos/simonw/sqlite-utils/issues/440	IC_kwDOCGYnMM5E4ipj	simonw 9599	2022-06-14T20:19:07Z	2022-06-14T20:19:07Z	OWNER	Documentation: https://sqlite-utils.datasette.io/en/latest/python-api.html#reading-rows-from-a-file	{"total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 1, "eyes": 0}	CSV files with too many values in a row cause errors 1250629388
1155714131	https://github.com/simonw/sqlite-utils/issues/442#issuecomment-1155714131	https://api.github.com/repos/simonw/sqlite-utils/issues/442	IC_kwDOCGYnMM5E4sxT	simonw 9599	2022-06-14T21:07:50Z	2022-06-14T21:07:50Z	OWNER	Here's the commit where I added that originally, including a test: https://github.com/simonw/sqlite-utils/commit/1a93b72ba710ea2271eaabc204685a27d2469374	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	`maximize_csv_field_size_limit()` utility function 1269886084
1155748444	https://github.com/simonw/sqlite-utils/issues/442#issuecomment-1155748444	https://api.github.com/repos/simonw/sqlite-utils/issues/442	IC_kwDOCGYnMM5E41Jc	simonw 9599	2022-06-14T21:55:15Z	2022-06-14T21:55:15Z	OWNER	Documentation: https://sqlite-utils.datasette.io/en/latest/python-api.html#setting-the-maximum-csv-field-size-limit	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	`maximize_csv_field_size_limit()` utility function 1269886084
1155749696	https://github.com/simonw/sqlite-utils/issues/433#issuecomment-1155749696	https://api.github.com/repos/simonw/sqlite-utils/issues/433	IC_kwDOCGYnMM5E41dA	simonw 9599	2022-06-14T21:57:05Z	2022-06-14T21:57:05Z	OWNER	Marking this as help wanted because I can't figure out how to replicate it!	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	CLI eats my cursor 1239034903
1155750270	https://github.com/simonw/sqlite-utils/issues/441#issuecomment-1155750270	https://api.github.com/repos/simonw/sqlite-utils/issues/441	IC_kwDOCGYnMM5E41l-	simonw 9599	2022-06-14T21:57:57Z	2022-06-14T21:57:57Z	OWNER	I added `where=` and `where_args=` parameters to that `.search()` method - updated documentation is here: https://sqlite-utils.datasette.io/en/latest/python-api.html#searching-with-table-search	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Combining `rows_where()` and `search()` to limit which rows are searched 1257724585
1155753397	https://github.com/simonw/sqlite-utils/issues/431#issuecomment-1155753397	https://api.github.com/repos/simonw/sqlite-utils/issues/431	IC_kwDOCGYnMM5E42W1	simonw 9599	2022-06-14T22:01:38Z	2022-06-14T22:01:38Z	OWNER	Yeah, I think it would be neat if the library could support self-referential many-to-many in a nice way. I'm not sure about the `left_name/right_name` design though. Would it be possible to have this work as the user intends, by spotting that the other table name `"people"` matches the name of the current table? ```python db["people"].insert({"name": "Mary"}, pk="name").m2m( "people", [{"name": "Michael"}, {"name": "Suzy"}], m2m_table="parent_child", pk="name" ) ``` The created table could look like this: ```sql CREATE TABLE [parent_child] ( [people_id_1] TEXT REFERENCES [people]([name]), [people_id_2] TEXT REFERENCES [people]([name]), PRIMARY KEY ([people_id_1], [people_id_2]) ) ``` I've not thought very hard about this, so the design I'm proposing here might not work. Are there other reasons people might wan the `left_name=` and `right_name=` parameters? If so then I'm much happier with those.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Allow making m2m relation of a table to itself 1227571375
1155756742	https://github.com/simonw/sqlite-utils/issues/432#issuecomment-1155756742	https://api.github.com/repos/simonw/sqlite-utils/issues/432	IC_kwDOCGYnMM5E43LG	simonw 9599	2022-06-14T22:05:38Z	2022-06-14T22:05:49Z	OWNER	I don't like the idea of `table_names()` returning names of tables from connected databases as well, because it feels like it could lead to surprising behaviour - especially if those connected databases turn to have table names that are duplicated in the main connected database. It would be neat if functions like `.rows_where()` worked though. One thought would be to support something like this: ```python rows = db["otherdb.tablename"].rows_where() ``` But... `.` is a valid character in a SQLite table name. So `"otherdb.tablename"` might ambiguously refer to a table called `tablename` in a connected database with the alias `otherdb`, OR a table in the current database with the name `otherdb.tablename`.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Support `rows_where()`, `delete_where()` etc for attached alias databases 1236693079
1155758664	https://github.com/simonw/sqlite-utils/issues/432#issuecomment-1155758664	https://api.github.com/repos/simonw/sqlite-utils/issues/432	IC_kwDOCGYnMM5E43pI	simonw 9599	2022-06-14T22:07:50Z	2022-06-14T22:07:50Z	OWNER	Another potential fix: add a `alias=` parameter to `rows_where()` and other similar methods. Then you could do this: ```python rows = db["tablename"].rows_where(alias="otherdb") ``` This feels wrong to me: `db["tablename"]` is the bit that is supposed to return a table object. Having part of what that table object is exist as a parameter to other methods is confusing.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Support `rows_where()`, `delete_where()` etc for attached alias databases 1236693079
1155759857	https://github.com/simonw/sqlite-utils/issues/432#issuecomment-1155759857	https://api.github.com/repos/simonw/sqlite-utils/issues/432	IC_kwDOCGYnMM5E437x	simonw 9599	2022-06-14T22:09:07Z	2022-06-14T22:09:07Z	OWNER	Third option, and I think the one I like the best: ```python rows = db.table("tablename", alias="otherdb").rows_where(alias="otherdb") ``` The `db.table(tablename)` method already exists as an alternative to `db[tablename]`: https://sqlite-utils.datasette.io/en/stable/python-api.html#python-api-table-configuration	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Support `rows_where()`, `delete_where()` etc for attached alias databases 1236693079
1155764064	https://github.com/simonw/sqlite-utils/issues/432#issuecomment-1155764064	https://api.github.com/repos/simonw/sqlite-utils/issues/432	IC_kwDOCGYnMM5E449g	simonw 9599	2022-06-14T22:15:44Z	2022-06-14T22:15:44Z	OWNER	Implementing this would be a pretty big change - initial instinct is that I'd need to introduce a `self.alias` property to `Queryable` (the subclass of `Table` and `View`) and a new `self.name_with_alias` getter which returns `alias.tablename` if `alias` is set to a not-None value. Then I'd need to rewrite every piece of code like this: https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/db.py#L1161 To look like this instead: ```python sql = "select {} from [{}]".format(select, self.name_with_alias) ``` But some parts would be harder - for example: https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/db.py#L1227-L1231 Would have to know to query `alias.sqlite_master` instead. The cached table counts logic like this would need a bunch of changes too: https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/db.py#L644-L657	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Support `rows_where()`, `delete_where()` etc for attached alias databases 1236693079
1155764428	https://github.com/simonw/sqlite-utils/issues/432#issuecomment-1155764428	https://api.github.com/repos/simonw/sqlite-utils/issues/432	IC_kwDOCGYnMM5E45DM	simonw 9599	2022-06-14T22:16:21Z	2022-06-14T22:16:21Z	OWNER	Initial idea of how the `.table()` method would change: ```diff diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py index 7a06304..3ecb40b 100644 --- a/sqlite_utils/db.py +++ b/sqlite_utils/db.py @@ -474,11 +474,12 @@ class Database: self._tracer(sql, None) return self.conn.executescript(sql) - def table(self, table_name: str, kwargs) -> Union["Table", "View"]: + def table(self, table_name: str, alias: Optional[str] = None, kwargs) -> Union["Table", "View"]: """ Return a table object, optionally configured with default options. :param table_name: Name of the table + :param alias: The database alias to use, if referring to a table in another connected database """ klass = View if table_name in self.view_names() else Table return klass(self, table_name, **kwargs) ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Support `rows_where()`, `delete_where()` etc for attached alias databases 1236693079
1155767202	https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155767202	https://api.github.com/repos/simonw/sqlite-utils/issues/439	IC_kwDOCGYnMM5E45ui	simonw 9599	2022-06-14T22:21:10Z	2022-06-14T22:21:10Z	OWNER	I can't figure out why that error is being swallowed like that. The most likely culprit was this code: https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/cli.py#L1021-L1043 But I tried changing it like this: ```diff diff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py index 86eddfb..ed26fdd 100644 --- a/sqlite_utils/cli.py +++ b/sqlite_utils/cli.py @@ -1023,6 +1023,7 @@ def insert_upsert_implementation( docs, pk=pk, batch_size=batch_size, alter=alter, **extra_kwargs ) except Exception as e: + raise if ( isinstance(e, sqlite3.OperationalError) and e.args ``` And your steps to reproduce still got to 49% and then failed silently.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Misleading progress bar against utf-16-le CSV input 1250495688
1155767915	https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155767915	https://api.github.com/repos/simonw/sqlite-utils/issues/440	IC_kwDOCGYnMM5E455r	simonw 9599	2022-06-14T22:22:27Z	2022-06-14T22:22:27Z	OWNER	I forgot to add equivalents of `extras_key=` and `ignore_extras=` to the CLI tool - will do that in a separate issue.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	CSV files with too many values in a row cause errors 1250629388
1155769216	https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155769216	https://api.github.com/repos/simonw/sqlite-utils/issues/439	IC_kwDOCGYnMM5E46OA	simonw 9599	2022-06-14T22:24:49Z	2022-06-14T22:25:06Z	OWNER	I have a hunch that this crash may be caused by a CSV value which is too long, as addressed at the library level in: - #440 But not yet addressed in the CLI tool, see: - #444 Either way though, I really don't like that errors like this are swallowed!	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Misleading progress bar against utf-16-le CSV input 1250495688
1155771462	https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155771462	https://api.github.com/repos/simonw/sqlite-utils/issues/439	IC_kwDOCGYnMM5E46xG	simonw 9599	2022-06-14T22:28:38Z	2022-06-14T22:28:38Z	OWNER	Maybe this isn't a CSV field value problem - I tried this patch and didn't seem to hit the new breakpoints: ```diff diff --git a/sqlite_utils/utils.py b/sqlite_utils/utils.py index d2ccc5f..f1b823a 100644 --- a/sqlite_utils/utils.py +++ b/sqlite_utils/utils.py @@ -204,13 +204,17 @@ def _extra_key_strategy( # DictReader adds a 'None' key with extra row values if None not in row: yield row - elif ignore_extras: + continue + else: + breakpoint() + if ignore_extras: # ignoring row.pop(none) because of this issue: # https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155358637 row.pop(None) # type: ignore yield row elif not extras_key: extras = row.pop(None) # type: ignore + breakpoint() raise RowError( "Row {} contained these extra values: {}".format(row, extras) ) ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Misleading progress bar against utf-16-le CSV input 1250495688
1155772244	https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155772244	https://api.github.com/repos/simonw/sqlite-utils/issues/439	IC_kwDOCGYnMM5E469U	simonw 9599	2022-06-14T22:30:03Z	2022-06-14T22:30:03Z	OWNER	Tried this: ``` % python -i $(which sqlite-utils) insert --csv --delimiter ";" --encoding "utf-16-le" test test.db csv [------------------------------------] 0% [#################-------------------] 49% 00:00:01Traceback (most recent call last): File "/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py", line 1072, in main ctx.exit() File "/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py", line 692, in exit raise Exit(code) click.exceptions.Exit: 0 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/bin/sqlite-utils", line 33, in <module> sys.exit(load_entry_point('sqlite-utils', 'console_scripts', 'sqlite-utils')()) File "/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py", line 1137, in __call__ return self.main(args, *kwargs) File "/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py", line 1090, in main sys.exit(e.exit_code) SystemExit: 0 >>> ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Misleading progress bar against utf-16-le CSV input 1250495688
1155776023	https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155776023	https://api.github.com/repos/simonw/sqlite-utils/issues/439	IC_kwDOCGYnMM5E474X	simonw 9599	2022-06-14T22:36:07Z	2022-06-14T22:36:07Z	OWNER	Wait! The arguments in that are the wrong way round. This is correct: sqlite-utils insert --csv --delimiter ";" --encoding "utf-16-le" test.db test csv It still outputs the following: [------------------------------------] 0% [#################-------------------] 49% 00:00:02% But it creates a `test.db` file that is 6.2MB. That database has 3141 rows in it: ``` % sqlite-utils tables test.db --counts -t table count ------- ------- test 3142 ``` I converted that `csv` file to utf-8 like so: iconv -f UTF-16LE -t UTF-8 csv > utf8.csv And it contains 3142 lines: ``` % wc -l utf8.csv 3142 utf8.csv ``` So my hunch here is that the problem is actually that the progress bar doesn't know how to correctly measure files in `utf-16-le` encoding!	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Misleading progress bar against utf-16-le CSV input 1250495688
1155781399	https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155781399	https://api.github.com/repos/simonw/sqlite-utils/issues/439	IC_kwDOCGYnMM5E49MX	simonw 9599	2022-06-14T22:45:41Z	2022-06-14T22:45:41Z	OWNER	TIL how to use `iconv`: https://til.simonwillison.net/linux/iconv	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Misleading progress bar against utf-16-le CSV input 1250495688
1155782835	https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155782835	https://api.github.com/repos/simonw/sqlite-utils/issues/439	IC_kwDOCGYnMM5E49iz	simonw 9599	2022-06-14T22:48:22Z	2022-06-14T22:49:53Z	OWNER	Here's the code that implements the progress bar in question: https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/cli.py#L918-L932 It calls `file_progress()` which looks like this: https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/utils.py#L159-L175 Which uses this: https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/utils.py#L148-L156	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Misleading progress bar against utf-16-le CSV input 1250495688
1155784284	https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155784284	https://api.github.com/repos/simonw/sqlite-utils/issues/439	IC_kwDOCGYnMM5E495c	simonw 9599	2022-06-14T22:51:03Z	2022-06-14T22:52:13Z	OWNER	Yes, this is the problem. The progress bar length is set to the length in bytes of the file - `os.path.getsize(file.name)` - but it's then incremented by the length of each DECODED line in turn. So if the file is in `utf-16-le` (twice the size of `utf-8`) the progress bar will finish at 50%!	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Misleading progress bar against utf-16-le CSV input 1250495688
1155788944	https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155788944	https://api.github.com/repos/simonw/sqlite-utils/issues/439	IC_kwDOCGYnMM5E4_CQ	simonw 9599	2022-06-14T23:00:24Z	2022-06-14T23:00:24Z	OWNER	The progress bar only works if the file-like object passed to it has a `fp.fileno()` that isn't 0 (for stdin) - that's how it detects that the file is something which it can measure the size of in order to show progress. If we know the file size in bytes AND we know the character encoding, can we change `UpdateWrapper` to update the number of bytes-per-character instead? I don't think so: I can't see a way of definitively saying "for this encoding the number of bytes per character is X" - and in fact I'm pretty sure that question doesn't even make sense since variable-length encodings exist.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Misleading progress bar against utf-16-le CSV input 1250495688
1155789101	https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155789101	https://api.github.com/repos/simonw/sqlite-utils/issues/439	IC_kwDOCGYnMM5E4_Et	simonw 9599	2022-06-14T23:00:45Z	2022-06-14T23:00:45Z	OWNER	I'm going to mark this as "help wanted" and leave it open. I'm glad that it's not actually a bug where errors get swallowed.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Misleading progress bar against utf-16-le CSV input 1250495688
1155791109	https://github.com/simonw/sqlite-utils/issues/434#issuecomment-1155791109	https://api.github.com/repos/simonw/sqlite-utils/issues/434	IC_kwDOCGYnMM5E4_kF	simonw 9599	2022-06-14T23:04:40Z	2022-06-14T23:04:40Z	OWNER	Definitely a bug - thanks for the detailed write-up! You're right, the code at fault is here: https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/db.py#L2213-L2231	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	`detect_fts()` identifies the wrong table if tables have names that are subsets of each other 1243151184
1155794149	https://github.com/simonw/sqlite-utils/issues/434#issuecomment-1155794149	https://api.github.com/repos/simonw/sqlite-utils/issues/434	IC_kwDOCGYnMM5E5ATl	simonw 9599	2022-06-14T23:09:54Z	2022-06-14T23:09:54Z	OWNER	A test that demonstrates the problem: ```python @pytest.mark.parametrize("reverse_order", (True, False)) def test_detect_fts_similar_tables(fresh_db, reverse_order): # https://github.com/simonw/sqlite-utils/issues/434 table1, table2 = ("demo", "demo2") if reverse_order: table1, table2 = table2, table1 fresh_db[table1].insert({"title": "Hello"}).enable_fts( ["title"], fts_version="FTS4" ) fresh_db[table2].insert({"title": "Hello"}).enable_fts( ["title"], fts_version="FTS4" ) assert fresh_db[table1].detect_fts() == "{}_fts".format(table1) assert fresh_db[table2].detect_fts() == "{}_fts".format(table2) ``` The order matters - so this test currently passes in one direction and fails in the other: ``` > assert fresh_db[table2].detect_fts() == "{}_fts".format(table2) E AssertionError: assert 'demo2_fts' == 'demo_fts' E - demo_fts E + demo2_fts E ? + tests/test_introspect.py:53: AssertionError ========================================================================================= short test summary info ========================================================================================= FAILED tests/test_introspect.py::test_detect_fts_similar_tables[True] - AssertionError: assert 'demo2_fts' == 'demo_fts' =============================================================================== 1 failed, 1 passed, 855 deselected in 1.00s =============================================================================== ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	`detect_fts()` identifies the wrong table if tables have names that are subsets of each other 1243151184
1155801812	https://github.com/simonw/sqlite-utils/issues/434#issuecomment-1155801812	https://api.github.com/repos/simonw/sqlite-utils/issues/434	IC_kwDOCGYnMM5E5CLU	simonw 9599	2022-06-14T23:23:32Z	2022-06-14T23:23:32Z	OWNER	Since table names can be quoted like this: ```sql CREATE VIRTUAL TABLE "searchable_fts" USING FTS4 (text1, text2, [name with . and spaces], content="searchable") ``` OR like this: ```sql CREATE VIRTUAL TABLE "searchable_fts" USING FTS4 (text1, text2, [name with . and spaces], content=[searchable]) ``` This fix looks to be correct to me (copying from the updated `test_with_trace()` test): ```python ( "SELECT name FROM sqlite_master\n" " WHERE rootpage = 0\n" " AND (\n" " sql LIKE :like\n" " OR sql LIKE :like2\n" " OR (\n" " tbl_name = :table\n" " AND sql LIKE '%VIRTUAL TABLE%USING FTS%'\n" " )\n" " )", { "like": "%VIRTUAL TABLE%USING FTS%content=[dogs]%", "like2": '%VIRTUAL TABLE%USING FTS%content="dogs"%', "table": "dogs", }, ) ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	`detect_fts()` identifies the wrong table if tables have names that are subsets of each other 1243151184
1155803262	https://github.com/simonw/sqlite-utils/issues/430#issuecomment-1155803262	https://api.github.com/repos/simonw/sqlite-utils/issues/430	IC_kwDOCGYnMM5E5Ch-	simonw 9599	2022-06-14T23:26:11Z	2022-06-14T23:26:11Z	OWNER	It looks like `PRAGMA temp_store` was the right option to use here: https://www.sqlite.org/pragma.html#pragma_temp_store `temp_store_directory` is listed as deprecated here: https://www.sqlite.org/pragma.html#pragma_temp_store_directory I'm going to turn this into a help-wanted documentation issue.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Document how to use `PRAGMA temp_store` to avoid errors when running VACUUM against huge databases 1224112817
1155804459	https://github.com/simonw/sqlite-utils/issues/444#issuecomment-1155804459	https://api.github.com/repos/simonw/sqlite-utils/issues/444	IC_kwDOCGYnMM5E5C0r	simonw 9599	2022-06-14T23:28:18Z	2022-06-14T23:28:18Z	OWNER	I think these become part of the `_import_options` list which is used in a few places: https://github.com/simonw/sqlite-utils/blob/b8af3b96f5c72317cc8783dc296a94f6719987d9/sqlite_utils/cli.py#L765-L800	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	CSV `extras_key=` and `ignore_extras=` equivalents for CLI tool 1271426387
1155804591	https://github.com/simonw/sqlite-utils/issues/444#issuecomment-1155804591	https://api.github.com/repos/simonw/sqlite-utils/issues/444	IC_kwDOCGYnMM5E5C2v	simonw 9599	2022-06-14T23:28:36Z	2022-06-14T23:28:36Z	OWNER	I'm going with `--extras-key` and `--ignore-extras` as the two new options.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	CSV `extras_key=` and `ignore_extras=` equivalents for CLI tool 1271426387
1155815186	https://github.com/simonw/sqlite-utils/issues/444#issuecomment-1155815186	https://api.github.com/repos/simonw/sqlite-utils/issues/444	IC_kwDOCGYnMM5E5FcS	simonw 9599	2022-06-14T23:48:16Z	2022-06-14T23:48:16Z	OWNER	This is tricky to implement because of this code: https://github.com/simonw/sqlite-utils/blob/b8af3b96f5c72317cc8783dc296a94f6719987d9/sqlite_utils/cli.py#L938-L945 It's reconstructing each document using the known headers here: `docs = (dict(zip(headers, row)) for row in reader)` So my first attempt at this - the diff here - did not have the desired result: ```diff diff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py index 86eddfb..00b920b 100644 --- a/sqlite_utils/cli.py +++ b/sqlite_utils/cli.py @@ -6,7 +6,7 @@ import hashlib import pathlib import sqlite_utils from sqlite_utils.db import AlterError, BadMultiValues, DescIndex -from sqlite_utils.utils import maximize_csv_field_size_limit +from sqlite_utils.utils import maximize_csv_field_size_limit, _extra_key_strategy from sqlite_utils import recipes import textwrap import inspect @@ -797,6 +797,15 @@ _import_options = ( "--encoding", help="Character encoding for input, defaults to utf-8", ), + click.option( + "--ignore-extras", + is_flag=True, + help="If a CSV line has more than the expected number of values, ignore the extras", + ), + click.option( + "--extras-key", + help="If a CSV line has more than the expected number of values put them in a list in this column", + ), ) @@ -885,6 +894,8 @@ def insert_upsert_implementation( sniff, no_headers, encoding, + ignore_extras, + extras_key, batch_size, alter, upsert, @@ -909,6 +920,10 @@ def insert_upsert_implementation( raise click.ClickException("--flatten cannot be used with --csv or --tsv") if encoding and not (csv or tsv): raise click.ClickException("--encoding must be used with --csv or --tsv") + if ignore_extras and extras_key: + raise click.ClickException( + "--ignore-extras and --extras-key cannot be used together" + ) if pk and len(pk) == 1: pk = pk[0] encodin…	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	CSV `extras_key=` and `ignore_extras=` equivalents for CLI tool 1271426387

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);