{"html_url": "https://github.com/simonw/sqlite-utils/issues/26#issuecomment-501508302", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/26", "id": 501508302, "node_id": "MDEyOklzc3VlQ29tbWVudDUwMTUwODMwMg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-06-13T00:57:52Z", "updated_at": "2019-06-13T00:57:52Z", "author_association": "OWNER", "body": "Two challenges here:\r\n\r\n1. We need a way to specify which tables should be used - e.g. \"put records from the `\"user\"` key in a `users` table, put multiple records from the `\"labels\"` key in a table called `labels`\" (we can pick an automatic name for the m2m table, though it might be nice to have an option to customize it)\r\n\r\n2. Should we deal with nested objects? Consider https://api.github.com/repos/simonw/datasette/pulls for example:\r\n\r\n\"Mozilla_Firefox\"\r\n\r\nHere we have `head.user` as a user, `head.repo` as a repo, and `head.repo.owner` as another user.\r\n\r\nIdeally our mechanism for specifying which table things should be pulled out into would handle this, but it's getting a bit complicated.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 455486286, "label": "Mechanism for turning nested JSON into foreign keys / many-to-many"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/26#issuecomment-501536495", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/26", "id": 501536495, "node_id": "MDEyOklzc3VlQ29tbWVudDUwMTUzNjQ5NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-06-13T03:40:21Z", "updated_at": "2019-06-13T03:40:21Z", "author_association": "OWNER", "body": "I think I can do something here with a very simple `head.repo.owner` path syntax. Normally this kind of syntax would have to take the difference between dictionaries and lists into account but I don't think that matters here.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 455486286, "label": "Mechanism for turning nested JSON into foreign keys / many-to-many"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/26#issuecomment-501537812", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/26", "id": 501537812, "node_id": "MDEyOklzc3VlQ29tbWVudDUwMTUzNzgxMg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-06-13T03:49:37Z", "updated_at": "2019-06-13T03:50:39Z", "author_association": "OWNER", "body": "There's an interesting difference here between nested objects with a primary-key style ID and nested objects without.\r\n\r\nIf a nested object does not have a primary key, we could still shift it out to another table but it would need to be in a context where it has an automatic foreign key back to our current record.\r\n\r\nA good example of something where that would be useful is the `outageDevices` key in https://github.com/simonw/pge-outages/blob/d890d09ff6e2997948028528e06c82e1efe30365/pge-outages.json#L13-L25 \r\n\r\n```json\r\n {\r\n \"outageNumber\": \"407367\",\r\n \"outageStartTime\": \"1560355216\",\r\n \"crewCurrentStatus\": \"PG&E repair crew is on-site working to restore power.\",\r\n \"currentEtor\": \"1560376800\",\r\n \"cause\": \"Our preliminary determination is that your outage was caused by scheduled maintenance work.\",\r\n \"estCustAffected\": \"3\",\r\n \"lastUpdateTime\": \"1560355709\",\r\n \"hazardFlag\": \"0\",\r\n \"latitude\": \"37.35629\",\r\n \"longitude\": \"-119.70469\",\r\n \"outageDevices\": [\r\n {\r\n \"latitude\": \"37.35409\",\r\n \"longitude\": \"-119.70575\"\r\n },\r\n {\r\n \"latitude\": \"37.35463\",\r\n \"longitude\": \"-119.70525\"\r\n },\r\n {\r\n \"latitude\": \"37.35562\",\r\n \"longitude\": \"-119.70467\"\r\n }\r\n ],\r\n \"regionName\": \"Ahwahnee\"\r\n }\r\n```\r\n\r\nThese could either be inserted into an `outageDevices` table that uses `rowid`... or we could have a mechanism where we automatically derive a primary key for them based on a hash of their data, hence avoiding creating duplicates even though we don't have a provided primary key.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 455486286, "label": "Mechanism for turning nested JSON into foreign keys / many-to-many"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/26#issuecomment-501538100", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/26", "id": 501538100, "node_id": "MDEyOklzc3VlQ29tbWVudDUwMTUzODEwMA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-06-13T03:51:27Z", "updated_at": "2019-06-13T03:51:27Z", "author_association": "OWNER", "body": "I like the term \"extract\" for what we are doing here, partly because that's the terminology I used in `csvs-to-sqlite`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 455486286, "label": "Mechanism for turning nested JSON into foreign keys / many-to-many"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/26#issuecomment-501539452", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/26", "id": 501539452, "node_id": "MDEyOklzc3VlQ29tbWVudDUwMTUzOTQ1Mg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-06-13T03:59:32Z", "updated_at": "2019-06-13T03:59:32Z", "author_association": "OWNER", "body": "Another complexity from the https://api.github.com/repos/simonw/datasette/pulls example:\r\n\r\n\"Mozilla_Firefox\"\r\n\r\nWe don't actually want `head` and `base` to be pulled out into a separate table. Our ideal table design would probably look something like this:\r\n\r\n- `url`: ...\r\n- `id`: `285698310`\r\n- ...\r\n- `user_id`: `9599` -> refs `users`\r\n- `head_label`: `simonw:travis-38dev`\r\n- `head_ref`: `travis-38dev`\r\n- `head_sha`: `f274f9004302c5ca75ce89d0abfd648457957e31`\r\n- `head_user_id`: `9599` -> refs `users`\r\n- `head_repo_id`: `107914493` -> refs `repos`\r\n- `base_label`: `simonw:master`\r\n- `base_ref`: `master`\r\n- `base_sha`: `5e8fbf7f6fbc0b63d0479da3806dd9ccd6aaa945`\r\n- `base_user_id`: `9599` -> refs `users`\r\n- `base_repo_id`: `107914493` -> refs `repos`\r\n\r\nSo the nested `head` and `base` sections here, instead of being extracted into another table, were flattened into their own columns.\r\n\r\nSo perhaps we need a flatten-nested-into-columns mechanism which can be used in conjunction with a extract-to-tables mechanism.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 455486286, "label": "Mechanism for turning nested JSON into foreign keys / many-to-many"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/26#issuecomment-501541902", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/26", "id": 501541902, "node_id": "MDEyOklzc3VlQ29tbWVudDUwMTU0MTkwMg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-06-13T04:15:22Z", "updated_at": "2019-06-13T16:55:42Z", "author_association": "OWNER", "body": "So maybe something like this:\r\n```\r\ncurl https://api.github.com/repos/simonw/datasette/pulls?state=all | \\\r\n sqlite-utils insert git.db pulls - \\\r\n --flatten=base \\\r\n --flatten=head \\\r\n --extract=user:users:id \\\r\n --extract=head_repo.license:licenses:key \\\r\n --extract=head_repo.owner:users \\\r\n --extract=head_repo\r\n --extract=base_repo.license:licenses:key \\\r\n --extract=base_repo.owner:users \\\r\n --extract=base_repo\r\n```\r\nIs the order of those nested `--extract` lines significant I wonder? It would be nice if the order didn't matter and the code figured out the right execution plan on its own.", "reactions": "{\"total_count\": 1, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 1, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 455486286, "label": "Mechanism for turning nested JSON into foreign keys / many-to-many"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/26#issuecomment-501542025", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/26", "id": 501542025, "node_id": "MDEyOklzc3VlQ29tbWVudDUwMTU0MjAyNQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-06-13T04:16:10Z", "updated_at": "2019-06-13T04:16:42Z", "author_association": "OWNER", "body": "So for `--extract` the format is `path-to-property:table-to-extract-to:primary-key`\r\n\r\nIf we find an array (as opposed to a direct nested object) at the end of the dotted path we do a m2m table.\r\n\r\nAnd if `primary-key` is omitted maybe we do the rowid thing with a foreign key back to ourselves.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 455486286, "label": "Mechanism for turning nested JSON into foreign keys / many-to-many"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/26#issuecomment-501543688", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/26", "id": 501543688, "node_id": "MDEyOklzc3VlQ29tbWVudDUwMTU0MzY4OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-06-13T04:26:15Z", "updated_at": "2019-06-13T04:26:15Z", "author_association": "OWNER", "body": "I may ignore `--flatten` for the moment - users can do their own flattening using `jq` if they need that.\r\n\r\n```\r\ncurl https://api.github.com/repos/simonw/datasette/pulls?state=all | jq \"\r\n [.[] | . + {\r\n base_label: .base.label,\r\n base_ref: .base.ref,\r\n base_sha: .base.sha,\r\n base_user: .base.user,\r\n base_repo: .base.repo,\r\n head_label: .head.label,\r\n head_ref: .head.ref,\r\n head_sha: .head.sha,\r\n head_user: .head.user,\r\n head_repo: .head.repo\r\n } | del(.base, .head, ._links)]\r\n\"\r\n```\r\nOutput: https://gist.github.com/simonw/2703ed43fcfe96eb8cfeee7b558b61e1", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 455486286, "label": "Mechanism for turning nested JSON into foreign keys / many-to-many"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/26#issuecomment-507051670", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/26", "id": 507051670, "node_id": "MDEyOklzc3VlQ29tbWVudDUwNzA1MTY3MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-06-30T17:04:09Z", "updated_at": "2019-06-30T17:04:09Z", "author_association": "OWNER", "body": "I think the implementation of this will benefit from #23 (syntactic sugar for creating m2m records)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 455486286, "label": "Mechanism for turning nested JSON into foreign keys / many-to-many"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/26#issuecomment-696566750", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/26", "id": 696566750, "node_id": "MDEyOklzc3VlQ29tbWVudDY5NjU2Njc1MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T07:55:00Z", "updated_at": "2020-09-22T07:55:00Z", "author_association": "OWNER", "body": "Problem: `extract` means something else now, see #47 and the upcoming work in #42.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 455486286, "label": "Mechanism for turning nested JSON into foreign keys / many-to-many"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/26#issuecomment-964205475", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/26", "id": 964205475, "node_id": "IC_kwDOCGYnMM45eJuj", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2021-11-09T14:31:29Z", "updated_at": "2021-11-09T14:31:29Z", "author_association": "CONTRIBUTOR", "body": "i was just reaching for a tool to do this this morning", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 455486286, "label": "Mechanism for turning nested JSON into foreign keys / many-to-many"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/26#issuecomment-1032120014", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/26", "id": 1032120014, "node_id": "IC_kwDOCGYnMM49hObO", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-02-08T01:32:34Z", "updated_at": "2022-02-08T01:32:34Z", "author_association": "CONTRIBUTOR", "body": "if you are curious about prior art, https://github.com/jsnell/json-to-multicsv is really good!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 455486286, "label": "Mechanism for turning nested JSON into foreign keys / many-to-many"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/26#issuecomment-1141711418", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/26", "id": 1141711418, "node_id": "IC_kwDOCGYnMM5EDSI6", "user": {"value": 19304, "label": "nileshtrivedi"}, "created_at": "2022-05-31T06:21:15Z", "updated_at": "2022-05-31T06:21:15Z", "author_association": "NONE", "body": "I ran into this. My use case has a JSON file with array of `book` objects with a key called `reviews` which is also an array of objects. My JSON is human-edited and does not specify IDs for either books or reviews. Because sqlite-utils does not support inserting nested objects, I instead have to maintain two separate CSV files with `id` column in `books.csv` and `book_id` column in reviews.csv.\r\n\r\nI think the right way to declare the relationship while inserting a JSON might be to describe the relationship:\r\n\r\n`sqlite-utils insert data.db books mydata.json --hasmany reviews --hasone author --manytomany tags`\r\n\r\nThis is relying on the assumption that foreign keys can point to `rowid` primary key.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 455486286, "label": "Mechanism for turning nested JSON into foreign keys / many-to-many"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/26#issuecomment-1170595021", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/26", "id": 1170595021, "node_id": "IC_kwDOCGYnMM5FxdzN", "user": {"value": 60892516, "label": "izzues"}, "created_at": "2022-06-29T23:35:29Z", "updated_at": "2022-06-29T23:35:29Z", "author_association": "NONE", "body": "Have you seen [MakeTypes](https://github.com/jvilk/MakeTypes)? Not the exact same thing but it may be relevant.\r\n\r\nAnd it's inspired by the paper [\"Types from Data: Making Structured Data First-Class Citizens in F#\"](https://dl.acm.org/citation.cfm?id=2908115).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 455486286, "label": "Mechanism for turning nested JSON into foreign keys / many-to-many"}, "performed_via_github_app": null}