{"id": 783778672, "node_id": "MDU6SXNzdWU3ODM3Nzg2NzI=", "number": 220, "title": "Better error message for *_fts methods against views", "user": {"value": 649467, "label": "mhalle"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2021-01-11T23:24:00Z", "updated_at": "2021-02-22T20:44:51Z", "closed_at": "2021-02-14T22:34:26Z", "author_association": "NONE", "pull_request": null, "body": "enable_fts and its related methods only work on tables, not views. \r\n\r\nCould those methods and possibly others move up to the Queryable superclass?\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/220/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 792297010, "node_id": "MDExOlB1bGxSZXF1ZXN0NTYwMjA0MzA2", "number": 224, "title": "Add fts offset docs.", "user": {"value": 37962604, "label": "polyrand"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2021-01-22T20:50:58Z", "updated_at": "2021-02-14T19:31:06Z", "closed_at": "2021-02-14T19:31:06Z", "author_association": "NONE", "pull_request": "simonw/sqlite-utils/pulls/224", "body": "The limit can be passed as a string to the query builder to have an offset. I have tested it using the shorthand `limit=f\"15, 30\"`, the standard syntax should work too.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/224/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 797159961, "node_id": "MDExOlB1bGxSZXF1ZXN0NTY0MjE1MDEx", "number": 225, "title": "fix for problem in Table.insert_all on search for columns per chunk of rows", "user": {"value": 261237, "label": "nieuwenhoven"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2021-01-29T20:16:07Z", "updated_at": "2021-02-14T21:04:13Z", "closed_at": "2021-02-14T21:04:13Z", "author_association": "NONE", "pull_request": "simonw/sqlite-utils/pulls/225", "body": "Hi,\r\n\r\nI ran into a problem when trying to create a database from my Apple Healthkit data using [healthkit-to-sqlite](https://github.com/dogsheep/healthkit-to-sqlite). The program crashed because of an invalid insert statement that was generated for table `rDistanceCycling`. \r\n\r\nThe actual problem turned out to be in [sqlite-utils](https://github.com/simonw/sqlite-utils). `Table.insert_all` processes the data to be inserted in chunks of rows and checks for every chunk which columns are used, and it will collect all column names in the variable `all_columns`. The collection of columns is done using a nested list comprehension that is not completely correct. \r\n\r\nI'm using a Windows machine and had to make a few adjustments to the tests in order to be able to run them because they had a posix dependency.\r\n\r\nThanks, kind regards,\r\n\r\nFrans\r\n\r\n```\r\n# this is a (condensed) chunk of data from my Apple healthkit export that caused the problem.\r\n# the 3 last items in the chunk have additional keys: metadata_HKMetadataKeySyncVersion and metadata_HKMetadataKeySyncIdentifier\r\n\r\nchunk = [{'sourceName': 'Apple\u00c2\\xa0Watch van Frans', 'sourceVersion': '7.0.1',\r\n 'device': '<, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:7.0.1>',\r\n 'unit': 'km', 'creationDate': '2020-10-10 12:29:09 +0100', 'startDate': '2020-10-10 12:29:06 +0100',\r\n 'endDate': '2020-10-10 12:29:07 +0100', 'value': '0.00518016'},\r\n {'sourceName': 'Apple\u00c2\\xa0Watch van Frans', 'sourceVersion': '7.0.1',\r\n 'device': '<, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:7.0.1>',\r\n 'unit': 'km', 'creationDate': '2020-10-10 12:29:10 +0100', 'startDate': '2020-10-10 12:29:07 +0100',\r\n 'endDate': '2020-10-10 12:29:08 +0100', 'value': '0.00544049'},\r\n {'sourceName': 'Apple\u00c2\\xa0Watch van Frans', 'sourceVersion': '6.2.6',\r\n 'device': '<, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',\r\n 'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:40:50 +0100',\r\n 'endDate': '2020-07-15 16:42:49 +0100', 'value': '0.952092', 'metadata_HKMetadataKeySyncVersion': '1',\r\n 'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520450.99823:616520569.99360:119'},\r\n {'sourceName': 'Apple\u00c2\\xa0Watch van Frans', 'sourceVersion': '6.2.6',\r\n 'device': '<, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',\r\n 'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:42:49 +0100',\r\n 'endDate': '2020-07-15 16:44:51 +0100', 'value': '0.848983', 'metadata_HKMetadataKeySyncVersion': '1',\r\n 'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520569.99360:616520691.98826:119'},\r\n {'sourceName': 'Apple\u00c2\\xa0Watch van Frans', 'sourceVersion': '6.2.6',\r\n 'device': '<, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',\r\n 'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:44:51 +0100',\r\n 'endDate': '2020-07-15 16:46:50 +0100', 'value': '0.834403', 'metadata_HKMetadataKeySyncVersion': '1',\r\n 'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520691.98826:616520810.98305:119'}]\r\n\r\n\r\n\r\ndef all_columns_old():\r\n all_columns = [col for col in chunk[0]]\r\n all_columns += [column for record in chunk\r\n for column in record if column not in all_columns]\r\n return all_columns\r\n\r\n\r\ndef all_columns_new():\r\n all_columns = [col for col in chunk[0]]\r\n for record in chunk:\r\n all_columns += [column for column in record if column not in all_columns]\r\n return all_columns\r\n\r\n\r\n\r\nif __name__ == '__main__':\r\n from pprint import pprint\r\n\r\n print('problem: ')\r\n pprint(all_columns_old())\r\n print('\\nfix: ')\r\n pprint(all_columns_new())\r\n\r\n```\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/225/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 807174161, "node_id": "MDU6SXNzdWU4MDcxNzQxNjE=", "number": 227, "title": "Error reading csv files with large column data", "user": {"value": 295329, "label": "camallen"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2021-02-12T11:51:47Z", "updated_at": "2021-02-16T11:48:03Z", "closed_at": "2021-02-14T21:17:19Z", "author_association": "NONE", "pull_request": null, "body": "*Feel free to close this issue - I mostly added it for reference for future folks that run into this :)*\r\n\r\nI have a CSV file with one column that has very long strings. When i try to import this file via the `insert` command I get the following error: \r\n```\r\nsqlite-utils insert database.db table_name file_with_large_column.csv\r\n\r\nTraceback (most recent call last):\r\n File \"/usr/local/bin/sqlite-utils\", line 10, in \r\n sys.exit(cli())\r\n File \"/usr/local/lib/python3.7/site-packages/click/core.py\", line 829, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"/usr/local/lib/python3.7/site-packages/click/core.py\", line 782, in main\r\n rv = self.invoke(ctx)\r\n File \"/usr/local/lib/python3.7/site-packages/click/core.py\", line 1259, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"/usr/local/lib/python3.7/site-packages/click/core.py\", line 1066, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"/usr/local/lib/python3.7/site-packages/click/core.py\", line 610, in invoke\r\n return callback(*args, **kwargs)\r\n File \"/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py\", line 774, in insert\r\n default=default,\r\n File \"/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py\", line 705, in insert_upsert_implementation\r\n docs, pk=pk, batch_size=batch_size, alter=alter, **extra_kwargs\r\n File \"/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py\", line 1852, in insert_all\r\n first_record = next(records)\r\n File \"/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py\", line 703, in \r\n docs = (decode_base64_values(doc) for doc in docs)\r\n File \"/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py\", line 681, in \r\n docs = (dict(zip(headers, row)) for row in reader)\r\n_csv.Error: field larger than field limit (131072)\r\n```\r\nBuilt with the docker image `datasetteproject/datasette:0.54` with the following versions:\r\n```\r\n# sqlite-utils --version\r\nsqlite-utils, version 3.4.1\r\n\r\n# datasette --version\r\ndatasette, version 0.54\r\n```\r\nIt appears this is a [known issue](https://stackoverflow.com/a/54517228/2761423) reading in csv files in python and [doesn't look to be modifiable](https://github.com/python/cpython/blob/ea46579067fd2d4e164d6605719ffec690c4d621/Modules/_csv.c#L1685) through system / env vars (i may be very wrong on this).\r\n\r\nNoting that using sqlite3 `import` command work without error (not using the python csv reader)\r\n```\r\nsqlite3 database.db\r\nsqlite> .mode csv\r\nsqlite> .import file_with_large_column.csv table_name\r\n```\r\nSadly I couldn't see an easy way around this while using the cli as it appears this value needs to be changed in python code. FWIW I've switched to using https://datasette.io/tools/csvs-to-sqlite for importing csv data and it's working well. \r\n\r\nFinally, I'm loving https://datasette.io/ thank you very much for an amazing tool and data ecosytem \ud83d\ude47\u200d\u2640\ufe0f ", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/227/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 807437089, "node_id": "MDU6SXNzdWU4MDc0MzcwODk=", "number": 228, "title": "--no-headers option for CSV and TSV", "user": {"value": 9599, "label": "simonw"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 10, "created_at": "2021-02-12T17:56:51Z", "updated_at": "2021-12-26T07:01:31Z", "closed_at": "2021-02-14T22:25:17Z", "author_association": "OWNER", "pull_request": null, "body": "https://bl.iro.bl.uk/work/ns/3037474a-761c-456d-a00c-9ef3c6773f4c has a fascinating CSV file that doesn't have a header row - it starts like this:\r\n\r\n```csv\r\nComputation and measurement of turbulent flow through idealized turbine blade passages,,\"Loizou, Panos A.\",https://isni.org/isni/0000000136122593,,University of Manchester,https://isni.org/isni/0000000121662407,1989,Thesis (Ph.D.),,Physical Sciences,,,https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.232781,\r\n\"Prolactin and growth hormone secretion in normal, hyperprolactinaemic and acromegalic man\",,\"Prescott, R. W. G.\",https://isni.org/isni/0000000134992122,,University of Newcastle upon Tyne,https://isni.org/isni/0000000104627212,1983,Thesis (Ph.D.),,Biological Sciences,,,https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.232784,\r\n```\r\n\r\nIt would be useful if `sqlite-utils insert ... --csv` had a mechanism for importing files like this one.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/228/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 807817197, "node_id": "MDU6SXNzdWU4MDc4MTcxOTc=", "number": 229, "title": "Hitting `_csv.Error: field larger than field limit (131072)`", "user": {"value": 631242, "label": "frosencrantz"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2021-02-13T19:52:44Z", "updated_at": "2021-02-14T21:33:33Z", "closed_at": "2021-02-14T21:33:33Z", "author_association": "NONE", "pull_request": null, "body": "I have a csv file where one of the fields is so large it is throwing an exception with this error and stops loading:\r\n\t```\r\n\t_csv.Error: field larger than field limit (131072)\r\n\t```\r\n\r\nThe stack trace occurs here: https://github.com/simonw/sqlite-utils/blob/3.1/sqlite_utils/cli.py#L633\r\n\r\n\r\nThere is a way to handle this that helps:\r\nhttps://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072\r\n\r\nOne issue I had with this problem was sqlite-utils only provides limited context as to where the problem line is.\r\nThere is the progress bar, but that is by percent rather than by line number. It would have been helpful if it could have provided a line number.\r\n\r\nAlso, it would have been useful if it had allowed the loading to continue with later lines.\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/229/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 808008305, "node_id": "MDU6SXNzdWU4MDgwMDgzMDU=", "number": 230, "title": "--sniff option for sniffing delimiters", "user": {"value": 9599, "label": "simonw"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 8, "created_at": "2021-02-14T17:43:54Z", "updated_at": "2021-02-14T21:15:33Z", "closed_at": "2021-02-14T19:24:32Z", "author_association": "OWNER", "pull_request": null, "body": "> I just spotted that `csv.Sniffer` in the Python standard library has a `.has_header(sample)` method which detects if the first row appears to be a header or not, which is interesting. https://docs.python.org/3/library/csv.html#csv.Sniffer\r\n\r\n_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778812050_", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/230/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 808028757, "node_id": "MDU6SXNzdWU4MDgwMjg3NTc=", "number": 231, "title": "limit=X, offset=Y parameters for more Python methods", "user": {"value": 9599, "label": "simonw"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2021-02-14T19:31:23Z", "updated_at": "2021-02-14T20:03:08Z", "closed_at": "2021-02-14T20:03:08Z", "author_association": "OWNER", "pull_request": null, "body": "> I'm going to add a `offset=` parameter to support this case. Thanks for the suggestion!\r\n\r\n_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/224#issuecomment-778828495_", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/231/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 808036774, "node_id": "MDU6SXNzdWU4MDgwMzY3NzQ=", "number": 232, "title": "Run tests against Windows in GitHub Actions", "user": {"value": 9599, "label": "simonw"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2021-02-14T20:09:45Z", "updated_at": "2021-02-14T20:39:55Z", "closed_at": "2021-02-14T20:39:55Z", "author_association": "OWNER", "pull_request": null, "body": "> I'm going to try and get the test suite to run in Windows on GitHub Actions.\r\n\r\n_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/225#issuecomment-778834504_", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/232/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 808037010, "node_id": "MDExOlB1bGxSZXF1ZXN0NTczMTQ3MTY4", "number": 233, "title": "Run tests against Ubuntu, macOS and Windows", "user": {"value": 9599, "label": "simonw"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2021-02-14T20:11:02Z", "updated_at": "2021-02-14T20:39:54Z", "closed_at": "2021-02-14T20:39:54Z", "author_association": "OWNER", "pull_request": "simonw/sqlite-utils/pulls/233", "body": "Refs #232", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/233/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 808046597, "node_id": "MDU6SXNzdWU4MDgwNDY1OTc=", "number": 234, "title": ".insert_all() fails if subsequent chunks contain additional columns", "user": {"value": 9599, "label": "simonw"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2021-02-14T21:01:51Z", "updated_at": "2021-02-14T21:03:40Z", "closed_at": "2021-02-14T21:03:40Z", "author_association": "OWNER", "pull_request": null, "body": "Reported by @nieuwenhoven in #225 along with a proposed fix.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/234/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"}