{"html_url": "https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864328927", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/279", "id": 864328927, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDMyODkyNw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-19T00:25:08Z", "updated_at": "2021-06-19T00:25:17Z", "author_association": "OWNER", "body": "I tried writing this function with type hints, but eventually gave up:\r\n```python\r\ndef rows_from_file(\r\n fp: BinaryIO,\r\n format: Optional[Format] = None,\r\n dialect: Optional[Type[csv.Dialect]] = None,\r\n encoding: Optional[str] = None,\r\n) -> Generator[dict, None, None]:\r\n if format == Format.JSON:\r\n decoded = json.load(fp)\r\n if isinstance(decoded, dict):\r\n decoded = [decoded]\r\n if not isinstance(decoded, list):\r\n raise RowsFromFileBadJSON(\"JSON must be a list or a dictionary\")\r\n yield from decoded\r\n elif format == Format.CSV:\r\n decoded_fp = io.TextIOWrapper(fp, encoding=encoding or \"utf-8-sig\")\r\n yield from csv.DictReader(decoded_fp)\r\n elif format == Format.TSV:\r\n yield from rows_from_file(\r\n fp, format=Format.CSV, dialect=csv.excel_tab, encoding=encoding\r\n )\r\n elif format is None:\r\n # Detect the format, then call this recursively\r\n buffered = io.BufferedReader(fp, buffer_size=4096)\r\n first_bytes = buffered.peek(2048).strip()\r\n if first_bytes[0] in (b\"[\", b\"{\"):\r\n # TODO: Detect newline-JSON\r\n yield from rows_from_file(fp, format=Format.JSON)\r\n else:\r\n dialect = csv.Sniffer().sniff(first_bytes.decode(encoding, \"ignore\"))\r\n yield from rows_from_file(\r\n fp, format=Format.CSV, dialect=dialect, encoding=encoding\r\n )\r\n else:\r\n raise RowsFromFileError(\"Bad format\")\r\n```\r\nmypy said:\r\n```\r\nsqlite_utils/utils.py:157: error: Argument 1 to \"BufferedReader\" has incompatible type \"BinaryIO\"; expected \"RawIOBase\"\r\nsqlite_utils/utils.py:163: error: Argument 1 to \"decode\" of \"bytes\" has incompatible type \"Optional[str]\"; expected \"str\"\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 924990677, "label": "sqlite-utils memory should handle TSV and JSON in addition to CSV"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864330508", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/279", "id": 864330508, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDMzMDUwOA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-19T00:34:24Z", "updated_at": "2021-06-19T00:34:24Z", "author_association": "OWNER", "body": "Got this working:\r\n\r\n % curl 'https://api.github.com/repos/simonw/datasette/issues' | sqlite-utils memory - 'select id from stdin' ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 924990677, "label": "sqlite-utils memory should handle TSV and JSON in addition to CSV"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/282#issuecomment-864348954", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/282", "id": 864348954, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDM0ODk1NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-19T03:34:42Z", "updated_at": "2021-06-19T03:35:46Z", "author_association": "OWNER", "body": "I built some prototype code here for something which looks at every row in a CSV import and records the likely types: https://gist.github.com/simonw/465f9356f175d1cf86957947dff501d4\r\n\r\nThis could be used by the command-line tools to figure out what `table.transform(types=...)` method to use at the end.\r\n\r\nThis is a different approach to the pure SQL version I tried building in https://github.com/simonw/sqlite-utils/issues/179 - I think this is a better approach though, it's less prone to weird idiosyncrasies of SQLite types, and it's also easy for us to add on to the existing CSV import code in a way that won't require scanning the data twice.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 925305186, "label": "Automatic type detection for CSV data"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/179#issuecomment-864349066", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/179", "id": 864349066, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDM0OTA2Ng==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-19T03:36:04Z", "updated_at": "2021-06-19T03:36:04Z", "author_association": "OWNER", "body": "This work is going to happen in #282.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 709577625, "label": "sqlite-utils transform/insert --detect-types"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/282#issuecomment-864349123", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/282", "id": 864349123, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDM0OTEyMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-19T03:36:54Z", "updated_at": "2021-06-19T03:36:54Z", "author_association": "OWNER", "body": "I may change the default for `sqlite-utils insert` to detect types if I release `sqlite-utils` 4.0, as a backwards-incompatible change.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 925305186, "label": "Automatic type detection for CSV data"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/282#issuecomment-864350407", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/282", "id": 864350407, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDM1MDQwNw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-19T03:52:20Z", "updated_at": "2021-06-19T03:52:20Z", "author_association": "OWNER", "body": "I'll have an environment variable for `--detect-types` so users who really want that as the default option can turn it on.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 925305186, "label": "Automatic type detection for CSV data"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/282#issuecomment-864354627", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/282", "id": 864354627, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDM1NDYyNw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-19T04:42:03Z", "updated_at": "2021-06-19T04:42:03Z", "author_association": "OWNER", "body": "Demo:\r\n\r\n curl -s 'https://api.github.com/users/simonw/repos?per_page=100' | \\\r\n sqlite-utils memory - 'select sum(size), sum(stargazers_count) from stdin limit 1'\r\n [{\"sum(size)\": 2042547, \"sum(stargazers_count)\": 6769}]\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 925305186, "label": "Automatic type detection for CSV data"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/284#issuecomment-864358680", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/284", "id": 864358680, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDM1ODY4MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-19T05:27:13Z", "updated_at": "2021-06-19T05:27:13Z", "author_association": "OWNER", "body": "How easy is it to detect a `rowid` table? Is it as simple as `.pks` returning `None`? If so the documentation should mention that.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 925320167, "label": ".transform(types=) turns rowid into a concrete column"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/284#issuecomment-864358951", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/284", "id": 864358951, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDM1ODk1MQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-19T05:30:00Z", "updated_at": "2021-06-19T05:30:00Z", "author_association": "OWNER", "body": "If this can be fixed it will be in the `transform_sql()` method.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 925320167, "label": ".transform(types=) turns rowid into a concrete column"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/283#issuecomment-864416086", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/283", "id": 864416086, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDQxNjA4Ng==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-19T14:49:06Z", "updated_at": "2021-06-19T14:49:13Z", "author_association": "OWNER", "body": "Once again, this is difficult because of the use of a generator here - `rows_from_file()` only yields rows, so there is no obvious mechanism for it to communicate back to the wrapping code that the detected format was CSV or TSV as opposed to JSON.\r\n\r\nI'm going to change `rows_from_file()` to return a `(generator, detected_format)` tuple.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 925319214, "label": "memory: Shouldn't detect types for JSON"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/284#issuecomment-864416785", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/284", "id": 864416785, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDQxNjc4NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-19T14:54:41Z", "updated_at": "2021-06-19T14:54:41Z", "author_association": "OWNER", "body": "```pycon\r\n>>> db = sqlite_utils.Database(memory=True)\r\n>>> db[\"rowid_table\"].insert({\"name\": \"Cleo\"})\r\n