Joining ~ 40 normalized tables for 200 columns to display detailed user profile. Can this be optimized?How to store one-byte integer in PostgreSQL?Error: “temporary file size exceeds temp_file_limit”Multiple values in lookup fieldsBest way to handle multiple query keys(MySQL) user table and profile separate, or same table?Create smaller row from alias and preserve column namesDatabase structures for similar profilesDatabase design — companies, employees, individualsNullable columns or jsonb for storing user profile?Using “Lookup Table”'s Rows Instead of Columns for TablesPostgreSQL structure for options that permit overriding and precedenceBasic company and user registration database - Pulling my hair out

Why does this part of the Space Shuttle launch pad seem to be floating in air?

The One-Electron Universe postulate is true - what simple change can I make to change the whole universe?

What does the "3am" section means in manpages?

Stereotypical names

Should my PhD thesis be submitted under my legal name?

Hostile work environment after whistle-blowing on coworker and our boss. What do I do?

Lightning Web Component - do I need to track changes for every single input field in a form

Can the electrostatic force be infinite in magnitude?

Is there enough fresh water in the world to eradicate the drinking water crisis?

Is exact Kanji stroke length important?

Proof of Lemma: Every integer can be written as a product of primes

Partial sums of primes

Science Fiction story where a man invents a machine that can help him watch history unfold

Greatest common substring

What is the opposite of 'gravitas'?

Should a half Jewish man be discouraged from marrying a Jewess?

Superhero words!

Organic chemistry Iodoform Reaction

Why is delta-v is the most useful quantity for planning space travel?

Is there an wasy way to program in Tikz something like the one in the image?

Teaching indefinite integrals that require special-casing

Word describing multiple paths to the same abstract outcome

Perfect riffle shuffles

Is a naturally all "male" species possible?



Joining ~ 40 normalized tables for 200 columns to display detailed user profile. Can this be optimized?


How to store one-byte integer in PostgreSQL?Error: “temporary file size exceeds temp_file_limit”Multiple values in lookup fieldsBest way to handle multiple query keys(MySQL) user table and profile separate, or same table?Create smaller row from alias and preserve column namesDatabase structures for similar profilesDatabase design — companies, employees, individualsNullable columns or jsonb for storing user profile?Using “Lookup Table”'s Rows Instead of Columns for TablesPostgreSQL structure for options that permit overriding and precedenceBasic company and user registration database - Pulling my hair out













0















A profile can have many profile types. That profile can have many fields, which are all tied to lookup tables. Some fields can have many selections.



profile -> profile_date -> one relationship status, one zodiac, many languages, many countries traveled, many cuisines, many activities, ...

profile -> profile_network -> one company, one school, ...

profile -> ...


A lookup table for the fields looks like this:



id type
1 ♈ ARIES
2 ♉ TAURUS
3 ♊ GEMINI
4 ♋ CANCER
5 ♌ LEO
6 ♍ VIRGO
7 ♎ LIBRA
8 ♏ SCORPIO
9 ♐ SAGITTARIUS
10 ♑ CAPRICORN
11 ♒ AQUARIUS
12 ♓ PISCES


If the field is a single selection, we join the lookup table and we're done.



-> profile_date (zodiac_type_id) -> zodiac_type.id


If the field is a multiple selection, there's a join table in between for the many-to-many.



-> profile_date -> (profile_date_id) profile_date_languages (language_type_id) -> language_type.id


In addition, I am joining other things like this to get all of their photos:



profile -> personas -> photos


In addition, the query also filters for things like persons who like each other, etc.



The query seems to start performing poorly with just a low number of profiles, like less than 10. If a single person has an obnoxious profile where they check all the multiple selections e.g. speak every language, been to every country, it grinds the database to crippled performance for everyone if their mega profile ends up being fetched.



That's the idea. You can see more of the fields here:



Sample query:
https://pastebin.com/wug86Wrr



Sample query plan:
http://www.sharecsv.com/s/a6333a7f610e049dd81ebcfc19a4c02f/temp_file_limit_exceeded.csv



Users will always be filtering by at least profile.birth_date (age), profile.radius (geolocation). Paid users can filter by any field. :)



PostgreSQL 9.6



Some ideas I had were:



  1. Optimize this query (I am not sure what can be done, am mostly wondering if I'm just doing something really stupid/unreasonable)


  2. Right now I am joining many tables per card, so the initial load is slower, but the data is all there if the user decides to explore and things look instant to them. I could instead fetch the minimum preview for the cards (join few tables per card), then only join the full profile (1 card filtered by id, many/the rest of the joins) if the user clicks on the preview to see the detail view (but I feel this is not the best user experience because the user has to now wait for the load every time they click a profile which is more noticeable). I am not sure this would even help because users can also filter by these fields, so I believe I would need to join them beforehand.


  3. Use ElasticSearch (I feel like I wouldn't need this until many more profiles?) I've done something similar to this with Algolia and it handled it like a champ, but I imagine this can just be done with PostgreSQL.










share|improve this question



















  • 1





    You have 20+ joins to tables that have 1-to-many relationships with your main (profile) table and your WHERE clause seems to not be limiting the result very much. No wonder you get very low performance of this query.

    – ypercubeᵀᴹ
    yesterday






  • 1





    Note that SQL generates a cartesian product for all child records for 1 to n relations (i.e. all possible combinations of all child records) . So, if child1 has 10 records, child2 4 records and child3 5 records for one main record, you will get a 10 * 4 * 5 = 200 resulting records just for this single main record! And you are joining 40 tables. My goodness!

    – Olivier Jacot-Descombes
    yesterday







  • 2





    You cannot have reasonable performance without (implicitly or explicitly) declared constraints & indexes. Regardless of whether you want the surrogates, you should be declaring the other constraints--CKs, UNIQUEs, NOT NULLs & FKs--and associated & other relevant indexes where your DBMS doesn't. Depending on your DBMS, you are likely to get the same performance whether or not you keep the surrogates. PS That lookup table should be id-symbol-name, with three CKs. And as @SQLRaptor says, the symbols & names exist to be unique ids--they are surrogate ids--you don't need to add a surrogate id.

    – philipxy
    yesterday







  • 1





    If you have a main person record p speaking languages l1, l2 who visited countries c1, c2, c3 and you join them, the result will be p, l1, c1, p, l1, c2, p, l1, c3, p, l2, c1, p, l2, c2, p, l2, c3, . Since SQL does not know which language is related to which country, it simply returns every possible combination of them. If you join this to lookup tables, the number of records does not increase, since now, the returned records are on the n-side of the relation. Make a small experiment, and you will see.

    – Olivier Jacot-Descombes
    16 hours ago















0















A profile can have many profile types. That profile can have many fields, which are all tied to lookup tables. Some fields can have many selections.



profile -> profile_date -> one relationship status, one zodiac, many languages, many countries traveled, many cuisines, many activities, ...

profile -> profile_network -> one company, one school, ...

profile -> ...


A lookup table for the fields looks like this:



id type
1 ♈ ARIES
2 ♉ TAURUS
3 ♊ GEMINI
4 ♋ CANCER
5 ♌ LEO
6 ♍ VIRGO
7 ♎ LIBRA
8 ♏ SCORPIO
9 ♐ SAGITTARIUS
10 ♑ CAPRICORN
11 ♒ AQUARIUS
12 ♓ PISCES


If the field is a single selection, we join the lookup table and we're done.



-> profile_date (zodiac_type_id) -> zodiac_type.id


If the field is a multiple selection, there's a join table in between for the many-to-many.



-> profile_date -> (profile_date_id) profile_date_languages (language_type_id) -> language_type.id


In addition, I am joining other things like this to get all of their photos:



profile -> personas -> photos


In addition, the query also filters for things like persons who like each other, etc.



The query seems to start performing poorly with just a low number of profiles, like less than 10. If a single person has an obnoxious profile where they check all the multiple selections e.g. speak every language, been to every country, it grinds the database to crippled performance for everyone if their mega profile ends up being fetched.



That's the idea. You can see more of the fields here:



Sample query:
https://pastebin.com/wug86Wrr



Sample query plan:
http://www.sharecsv.com/s/a6333a7f610e049dd81ebcfc19a4c02f/temp_file_limit_exceeded.csv



Users will always be filtering by at least profile.birth_date (age), profile.radius (geolocation). Paid users can filter by any field. :)



PostgreSQL 9.6



Some ideas I had were:



  1. Optimize this query (I am not sure what can be done, am mostly wondering if I'm just doing something really stupid/unreasonable)


  2. Right now I am joining many tables per card, so the initial load is slower, but the data is all there if the user decides to explore and things look instant to them. I could instead fetch the minimum preview for the cards (join few tables per card), then only join the full profile (1 card filtered by id, many/the rest of the joins) if the user clicks on the preview to see the detail view (but I feel this is not the best user experience because the user has to now wait for the load every time they click a profile which is more noticeable). I am not sure this would even help because users can also filter by these fields, so I believe I would need to join them beforehand.


  3. Use ElasticSearch (I feel like I wouldn't need this until many more profiles?) I've done something similar to this with Algolia and it handled it like a champ, but I imagine this can just be done with PostgreSQL.










share|improve this question



















  • 1





    You have 20+ joins to tables that have 1-to-many relationships with your main (profile) table and your WHERE clause seems to not be limiting the result very much. No wonder you get very low performance of this query.

    – ypercubeᵀᴹ
    yesterday






  • 1





    Note that SQL generates a cartesian product for all child records for 1 to n relations (i.e. all possible combinations of all child records) . So, if child1 has 10 records, child2 4 records and child3 5 records for one main record, you will get a 10 * 4 * 5 = 200 resulting records just for this single main record! And you are joining 40 tables. My goodness!

    – Olivier Jacot-Descombes
    yesterday







  • 2





    You cannot have reasonable performance without (implicitly or explicitly) declared constraints & indexes. Regardless of whether you want the surrogates, you should be declaring the other constraints--CKs, UNIQUEs, NOT NULLs & FKs--and associated & other relevant indexes where your DBMS doesn't. Depending on your DBMS, you are likely to get the same performance whether or not you keep the surrogates. PS That lookup table should be id-symbol-name, with three CKs. And as @SQLRaptor says, the symbols & names exist to be unique ids--they are surrogate ids--you don't need to add a surrogate id.

    – philipxy
    yesterday







  • 1





    If you have a main person record p speaking languages l1, l2 who visited countries c1, c2, c3 and you join them, the result will be p, l1, c1, p, l1, c2, p, l1, c3, p, l2, c1, p, l2, c2, p, l2, c3, . Since SQL does not know which language is related to which country, it simply returns every possible combination of them. If you join this to lookup tables, the number of records does not increase, since now, the returned records are on the n-side of the relation. Make a small experiment, and you will see.

    – Olivier Jacot-Descombes
    16 hours ago













0












0








0


1






A profile can have many profile types. That profile can have many fields, which are all tied to lookup tables. Some fields can have many selections.



profile -> profile_date -> one relationship status, one zodiac, many languages, many countries traveled, many cuisines, many activities, ...

profile -> profile_network -> one company, one school, ...

profile -> ...


A lookup table for the fields looks like this:



id type
1 ♈ ARIES
2 ♉ TAURUS
3 ♊ GEMINI
4 ♋ CANCER
5 ♌ LEO
6 ♍ VIRGO
7 ♎ LIBRA
8 ♏ SCORPIO
9 ♐ SAGITTARIUS
10 ♑ CAPRICORN
11 ♒ AQUARIUS
12 ♓ PISCES


If the field is a single selection, we join the lookup table and we're done.



-> profile_date (zodiac_type_id) -> zodiac_type.id


If the field is a multiple selection, there's a join table in between for the many-to-many.



-> profile_date -> (profile_date_id) profile_date_languages (language_type_id) -> language_type.id


In addition, I am joining other things like this to get all of their photos:



profile -> personas -> photos


In addition, the query also filters for things like persons who like each other, etc.



The query seems to start performing poorly with just a low number of profiles, like less than 10. If a single person has an obnoxious profile where they check all the multiple selections e.g. speak every language, been to every country, it grinds the database to crippled performance for everyone if their mega profile ends up being fetched.



That's the idea. You can see more of the fields here:



Sample query:
https://pastebin.com/wug86Wrr



Sample query plan:
http://www.sharecsv.com/s/a6333a7f610e049dd81ebcfc19a4c02f/temp_file_limit_exceeded.csv



Users will always be filtering by at least profile.birth_date (age), profile.radius (geolocation). Paid users can filter by any field. :)



PostgreSQL 9.6



Some ideas I had were:



  1. Optimize this query (I am not sure what can be done, am mostly wondering if I'm just doing something really stupid/unreasonable)


  2. Right now I am joining many tables per card, so the initial load is slower, but the data is all there if the user decides to explore and things look instant to them. I could instead fetch the minimum preview for the cards (join few tables per card), then only join the full profile (1 card filtered by id, many/the rest of the joins) if the user clicks on the preview to see the detail view (but I feel this is not the best user experience because the user has to now wait for the load every time they click a profile which is more noticeable). I am not sure this would even help because users can also filter by these fields, so I believe I would need to join them beforehand.


  3. Use ElasticSearch (I feel like I wouldn't need this until many more profiles?) I've done something similar to this with Algolia and it handled it like a champ, but I imagine this can just be done with PostgreSQL.










share|improve this question
















A profile can have many profile types. That profile can have many fields, which are all tied to lookup tables. Some fields can have many selections.



profile -> profile_date -> one relationship status, one zodiac, many languages, many countries traveled, many cuisines, many activities, ...

profile -> profile_network -> one company, one school, ...

profile -> ...


A lookup table for the fields looks like this:



id type
1 ♈ ARIES
2 ♉ TAURUS
3 ♊ GEMINI
4 ♋ CANCER
5 ♌ LEO
6 ♍ VIRGO
7 ♎ LIBRA
8 ♏ SCORPIO
9 ♐ SAGITTARIUS
10 ♑ CAPRICORN
11 ♒ AQUARIUS
12 ♓ PISCES


If the field is a single selection, we join the lookup table and we're done.



-> profile_date (zodiac_type_id) -> zodiac_type.id


If the field is a multiple selection, there's a join table in between for the many-to-many.



-> profile_date -> (profile_date_id) profile_date_languages (language_type_id) -> language_type.id


In addition, I am joining other things like this to get all of their photos:



profile -> personas -> photos


In addition, the query also filters for things like persons who like each other, etc.



The query seems to start performing poorly with just a low number of profiles, like less than 10. If a single person has an obnoxious profile where they check all the multiple selections e.g. speak every language, been to every country, it grinds the database to crippled performance for everyone if their mega profile ends up being fetched.



That's the idea. You can see more of the fields here:



Sample query:
https://pastebin.com/wug86Wrr



Sample query plan:
http://www.sharecsv.com/s/a6333a7f610e049dd81ebcfc19a4c02f/temp_file_limit_exceeded.csv



Users will always be filtering by at least profile.birth_date (age), profile.radius (geolocation). Paid users can filter by any field. :)



PostgreSQL 9.6



Some ideas I had were:



  1. Optimize this query (I am not sure what can be done, am mostly wondering if I'm just doing something really stupid/unreasonable)


  2. Right now I am joining many tables per card, so the initial load is slower, but the data is all there if the user decides to explore and things look instant to them. I could instead fetch the minimum preview for the cards (join few tables per card), then only join the full profile (1 card filtered by id, many/the rest of the joins) if the user clicks on the preview to see the detail view (but I feel this is not the best user experience because the user has to now wait for the load every time they click a profile which is more noticeable). I am not sure this would even help because users can also filter by these fields, so I believe I would need to join them beforehand.


  3. Use ElasticSearch (I feel like I wouldn't need this until many more profiles?) I've done something similar to this with Algolia and it handled it like a champ, but I imagine this can just be done with PostgreSQL.







postgresql database-design






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited yesterday









David Browne - Microsoft

12.1k729




12.1k729










asked yesterday









atkaylaatkayla

1085




1085







  • 1





    You have 20+ joins to tables that have 1-to-many relationships with your main (profile) table and your WHERE clause seems to not be limiting the result very much. No wonder you get very low performance of this query.

    – ypercubeᵀᴹ
    yesterday






  • 1





    Note that SQL generates a cartesian product for all child records for 1 to n relations (i.e. all possible combinations of all child records) . So, if child1 has 10 records, child2 4 records and child3 5 records for one main record, you will get a 10 * 4 * 5 = 200 resulting records just for this single main record! And you are joining 40 tables. My goodness!

    – Olivier Jacot-Descombes
    yesterday







  • 2





    You cannot have reasonable performance without (implicitly or explicitly) declared constraints & indexes. Regardless of whether you want the surrogates, you should be declaring the other constraints--CKs, UNIQUEs, NOT NULLs & FKs--and associated & other relevant indexes where your DBMS doesn't. Depending on your DBMS, you are likely to get the same performance whether or not you keep the surrogates. PS That lookup table should be id-symbol-name, with three CKs. And as @SQLRaptor says, the symbols & names exist to be unique ids--they are surrogate ids--you don't need to add a surrogate id.

    – philipxy
    yesterday







  • 1





    If you have a main person record p speaking languages l1, l2 who visited countries c1, c2, c3 and you join them, the result will be p, l1, c1, p, l1, c2, p, l1, c3, p, l2, c1, p, l2, c2, p, l2, c3, . Since SQL does not know which language is related to which country, it simply returns every possible combination of them. If you join this to lookup tables, the number of records does not increase, since now, the returned records are on the n-side of the relation. Make a small experiment, and you will see.

    – Olivier Jacot-Descombes
    16 hours ago












  • 1





    You have 20+ joins to tables that have 1-to-many relationships with your main (profile) table and your WHERE clause seems to not be limiting the result very much. No wonder you get very low performance of this query.

    – ypercubeᵀᴹ
    yesterday






  • 1





    Note that SQL generates a cartesian product for all child records for 1 to n relations (i.e. all possible combinations of all child records) . So, if child1 has 10 records, child2 4 records and child3 5 records for one main record, you will get a 10 * 4 * 5 = 200 resulting records just for this single main record! And you are joining 40 tables. My goodness!

    – Olivier Jacot-Descombes
    yesterday







  • 2





    You cannot have reasonable performance without (implicitly or explicitly) declared constraints & indexes. Regardless of whether you want the surrogates, you should be declaring the other constraints--CKs, UNIQUEs, NOT NULLs & FKs--and associated & other relevant indexes where your DBMS doesn't. Depending on your DBMS, you are likely to get the same performance whether or not you keep the surrogates. PS That lookup table should be id-symbol-name, with three CKs. And as @SQLRaptor says, the symbols & names exist to be unique ids--they are surrogate ids--you don't need to add a surrogate id.

    – philipxy
    yesterday







  • 1





    If you have a main person record p speaking languages l1, l2 who visited countries c1, c2, c3 and you join them, the result will be p, l1, c1, p, l1, c2, p, l1, c3, p, l2, c1, p, l2, c2, p, l2, c3, . Since SQL does not know which language is related to which country, it simply returns every possible combination of them. If you join this to lookup tables, the number of records does not increase, since now, the returned records are on the n-side of the relation. Make a small experiment, and you will see.

    – Olivier Jacot-Descombes
    16 hours ago







1




1





You have 20+ joins to tables that have 1-to-many relationships with your main (profile) table and your WHERE clause seems to not be limiting the result very much. No wonder you get very low performance of this query.

– ypercubeᵀᴹ
yesterday





You have 20+ joins to tables that have 1-to-many relationships with your main (profile) table and your WHERE clause seems to not be limiting the result very much. No wonder you get very low performance of this query.

– ypercubeᵀᴹ
yesterday




1




1





Note that SQL generates a cartesian product for all child records for 1 to n relations (i.e. all possible combinations of all child records) . So, if child1 has 10 records, child2 4 records and child3 5 records for one main record, you will get a 10 * 4 * 5 = 200 resulting records just for this single main record! And you are joining 40 tables. My goodness!

– Olivier Jacot-Descombes
yesterday






Note that SQL generates a cartesian product for all child records for 1 to n relations (i.e. all possible combinations of all child records) . So, if child1 has 10 records, child2 4 records and child3 5 records for one main record, you will get a 10 * 4 * 5 = 200 resulting records just for this single main record! And you are joining 40 tables. My goodness!

– Olivier Jacot-Descombes
yesterday





2




2





You cannot have reasonable performance without (implicitly or explicitly) declared constraints & indexes. Regardless of whether you want the surrogates, you should be declaring the other constraints--CKs, UNIQUEs, NOT NULLs & FKs--and associated & other relevant indexes where your DBMS doesn't. Depending on your DBMS, you are likely to get the same performance whether or not you keep the surrogates. PS That lookup table should be id-symbol-name, with three CKs. And as @SQLRaptor says, the symbols & names exist to be unique ids--they are surrogate ids--you don't need to add a surrogate id.

– philipxy
yesterday






You cannot have reasonable performance without (implicitly or explicitly) declared constraints & indexes. Regardless of whether you want the surrogates, you should be declaring the other constraints--CKs, UNIQUEs, NOT NULLs & FKs--and associated & other relevant indexes where your DBMS doesn't. Depending on your DBMS, you are likely to get the same performance whether or not you keep the surrogates. PS That lookup table should be id-symbol-name, with three CKs. And as @SQLRaptor says, the symbols & names exist to be unique ids--they are surrogate ids--you don't need to add a surrogate id.

– philipxy
yesterday





1




1





If you have a main person record p speaking languages l1, l2 who visited countries c1, c2, c3 and you join them, the result will be p, l1, c1, p, l1, c2, p, l1, c3, p, l2, c1, p, l2, c2, p, l2, c3, . Since SQL does not know which language is related to which country, it simply returns every possible combination of them. If you join this to lookup tables, the number of records does not increase, since now, the returned records are on the n-side of the relation. Make a small experiment, and you will see.

– Olivier Jacot-Descombes
16 hours ago





If you have a main person record p speaking languages l1, l2 who visited countries c1, c2, c3 and you join them, the result will be p, l1, c1, p, l1, c2, p, l1, c3, p, l2, c1, p, l2, c2, p, l2, c3, . Since SQL does not know which language is related to which country, it simply returns every possible combination of them. If you join this to lookup tables, the number of records does not increase, since now, the returned records are on the n-side of the relation. Make a small experiment, and you will see.

– Olivier Jacot-Descombes
16 hours ago










2 Answers
2






active

oldest

votes


















4














Option 4 - correct your schema design and stop using 'IDs' or whatever other surrogate key, the zodiac table is NOT a lookup table. it's a relation that holds all zodiac signs, and the way you tell the zodiac signs apart is their name, therefore, the key should be:



CREATE TABLE Zodiacs (Zodiac VARCHAR(11) PRIMARY KEY);


Use the Zodiac as the foreign key in your profile table, and now you don't need a join at all.
Do that to all the other one-to many as well, like relationship status, company, school etc. and you just saved yourself most of the joins.
For the many-to-many, like languages and countries, so the same, and you will now need only one join instead of two for each relation.
Give it a try and see how all your queries magically shrink by 90%, and perform orders of magnitude better. you will also explicitly maintain uniqueness and prevent logical duplicates, and you will also see how filters perform orders of magnitude better as the optimizer will be able to correctly evaluate distribution of the actual value, not some random ID from another table.



BTW - there are no fields in a SQL database, there are columns and rows only.
These are the counterparts of Tuples and attributes in the relational model



HTH






share|improve this answer

























  • Oh man. Drawing it on the whiteboard, I clearly see how this would eliminate each of the joins to one, and 1 less from each of the joins to many. I can see how this would make it significantly faster/better/cleaner. Before I built this thing out, I read various questions and even posted one of my own to the topic of natural key vs. join surrogate key. The consensus seems to be to use that surrogate key id, but seeing this now before me it makes no sense to do that ridiculous amount of those extra joins.

    – atkayla
    yesterday












  • I was wondering if you could elaborate for me why the recommendations people make to use the surrogate id key don't really apply to my use case. I read the information and still made the wrong decision, so I want to make sure I really understand. - stackoverflow.com/questions/54844992/… - stackoverflow.com/questions/337503/… - stackoverflow.com/questions/63090/… It seems like the cons of my surrogate key id far outweighed pros.

    – atkayla
    yesterday







  • 2





    "The recommendations that people make" are based on myths that go back 50 years, from the days where magnetic tapes required physical address pointers. It has become so widespread, that people rarely stop and think about it, and just take it for granted. It doesn't apply to your case only, it applies to EVERY relational database on the planet. Watch this video to get more answers: youtube.com/watch?v=r2MDDPDi_Pg

    – SQLRaptor
    yesterday












  • I really enjoyed your talk and learned a lot! I love how it's about this exact topic in great detail and Pro Artificial vs. Pro Natural, the 99.5% vs. the 0.5%, haha. I mean with those Pro Artifical (id, type) lookups, what is recommended as the "de-facto standard", I'm already seeing issues with just 10 profiles in my database during an alpha test. You just got a new convert. :) I will refactor my entire schema and report back with whether I think it was a good decision or not, although I'm sure it will be due to the obvious elimination of so many joins. Just need to see for myself!

    – atkayla
    yesterday






  • 2





    +1 for the clarification! :-)

    – Vérace
    yesterday


















1














Natural key, surrogate key? Each has advantages. Depends on the use case. The decision should be based on reason, not on ideology.



If natural keys are getting longer, surrogate keys start to look better. Size matters, especially if resources are limited and tables grow big. Small tables are fast anyway (unless you seriously mess up).



If you have limited RAM (you mentioned 3.75GB in a comment) and many rows in the main table, the sheer size starts to weigh in. Storing varchar(50) 1M times means up to 50+ MB on disk (and a bit more in RAM). If you have 100 columns like that, your rows grow huge and performance suffers. Replace it with a 1, 2 or 4 byte surrogate key to make that 1 - 4 MB per column instead.



The zodiac signs in your example table are short, descriptive and immutable. Perfect candidates for a natural key. Not much to gain with a surrogate key. The immutable nature of zodiac signs and the absence of any additional attributes (columns) also make it a perfect candidate for an enum type, which is simple to handle, just as fast and only occupies 4 bytes on disk.



I prepared a quick demo to compare:



Comparing 3 variants for your simple "zodiac" case:



  • Surrogate keys

  • Natural keys

  • Enums

db<>fiddle here



Enums seem like the best choice in this particular case. I am not a big fan of enums in general, as they make it harder to add/remove rows and impossible to add more columns to the lookup values.



Most probably there are other problems in your query, but it's too big to represent a question. Would be an extended code audit to go in deep.



About the 1-byte "char" type I used in the demo:



  • How to store one-byte integer in PostgreSQL?





share|improve this answer




















  • 1





    It is true that natural keys MIGHT take up more storage, but that is only one aspect of the overall impact they have on performance, and looking just at any one aspect, may lead to wrong conclusions. For example, regarding memory consumption - when using surrogates, forcing a join, memory has to load pages from both tables, consuming significantly more memory space than loading pages from just one table.

    – SQLRaptor
    12 hours ago











  • Small lookup tables factor in as a single data page and hardly matter in that respect in connection with big tables. That calculation changes with tables of comparable size / cardinality, but there you typically need to join in any case for additional columns.

    – Erwin Brandstetter
    12 hours ago










Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "182"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f232952%2fjoining-40-normalized-tables-for-200-columns-to-display-detailed-user-profile%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









4














Option 4 - correct your schema design and stop using 'IDs' or whatever other surrogate key, the zodiac table is NOT a lookup table. it's a relation that holds all zodiac signs, and the way you tell the zodiac signs apart is their name, therefore, the key should be:



CREATE TABLE Zodiacs (Zodiac VARCHAR(11) PRIMARY KEY);


Use the Zodiac as the foreign key in your profile table, and now you don't need a join at all.
Do that to all the other one-to many as well, like relationship status, company, school etc. and you just saved yourself most of the joins.
For the many-to-many, like languages and countries, so the same, and you will now need only one join instead of two for each relation.
Give it a try and see how all your queries magically shrink by 90%, and perform orders of magnitude better. you will also explicitly maintain uniqueness and prevent logical duplicates, and you will also see how filters perform orders of magnitude better as the optimizer will be able to correctly evaluate distribution of the actual value, not some random ID from another table.



BTW - there are no fields in a SQL database, there are columns and rows only.
These are the counterparts of Tuples and attributes in the relational model



HTH






share|improve this answer

























  • Oh man. Drawing it on the whiteboard, I clearly see how this would eliminate each of the joins to one, and 1 less from each of the joins to many. I can see how this would make it significantly faster/better/cleaner. Before I built this thing out, I read various questions and even posted one of my own to the topic of natural key vs. join surrogate key. The consensus seems to be to use that surrogate key id, but seeing this now before me it makes no sense to do that ridiculous amount of those extra joins.

    – atkayla
    yesterday












  • I was wondering if you could elaborate for me why the recommendations people make to use the surrogate id key don't really apply to my use case. I read the information and still made the wrong decision, so I want to make sure I really understand. - stackoverflow.com/questions/54844992/… - stackoverflow.com/questions/337503/… - stackoverflow.com/questions/63090/… It seems like the cons of my surrogate key id far outweighed pros.

    – atkayla
    yesterday







  • 2





    "The recommendations that people make" are based on myths that go back 50 years, from the days where magnetic tapes required physical address pointers. It has become so widespread, that people rarely stop and think about it, and just take it for granted. It doesn't apply to your case only, it applies to EVERY relational database on the planet. Watch this video to get more answers: youtube.com/watch?v=r2MDDPDi_Pg

    – SQLRaptor
    yesterday












  • I really enjoyed your talk and learned a lot! I love how it's about this exact topic in great detail and Pro Artificial vs. Pro Natural, the 99.5% vs. the 0.5%, haha. I mean with those Pro Artifical (id, type) lookups, what is recommended as the "de-facto standard", I'm already seeing issues with just 10 profiles in my database during an alpha test. You just got a new convert. :) I will refactor my entire schema and report back with whether I think it was a good decision or not, although I'm sure it will be due to the obvious elimination of so many joins. Just need to see for myself!

    – atkayla
    yesterday






  • 2





    +1 for the clarification! :-)

    – Vérace
    yesterday















4














Option 4 - correct your schema design and stop using 'IDs' or whatever other surrogate key, the zodiac table is NOT a lookup table. it's a relation that holds all zodiac signs, and the way you tell the zodiac signs apart is their name, therefore, the key should be:



CREATE TABLE Zodiacs (Zodiac VARCHAR(11) PRIMARY KEY);


Use the Zodiac as the foreign key in your profile table, and now you don't need a join at all.
Do that to all the other one-to many as well, like relationship status, company, school etc. and you just saved yourself most of the joins.
For the many-to-many, like languages and countries, so the same, and you will now need only one join instead of two for each relation.
Give it a try and see how all your queries magically shrink by 90%, and perform orders of magnitude better. you will also explicitly maintain uniqueness and prevent logical duplicates, and you will also see how filters perform orders of magnitude better as the optimizer will be able to correctly evaluate distribution of the actual value, not some random ID from another table.



BTW - there are no fields in a SQL database, there are columns and rows only.
These are the counterparts of Tuples and attributes in the relational model



HTH






share|improve this answer

























  • Oh man. Drawing it on the whiteboard, I clearly see how this would eliminate each of the joins to one, and 1 less from each of the joins to many. I can see how this would make it significantly faster/better/cleaner. Before I built this thing out, I read various questions and even posted one of my own to the topic of natural key vs. join surrogate key. The consensus seems to be to use that surrogate key id, but seeing this now before me it makes no sense to do that ridiculous amount of those extra joins.

    – atkayla
    yesterday












  • I was wondering if you could elaborate for me why the recommendations people make to use the surrogate id key don't really apply to my use case. I read the information and still made the wrong decision, so I want to make sure I really understand. - stackoverflow.com/questions/54844992/… - stackoverflow.com/questions/337503/… - stackoverflow.com/questions/63090/… It seems like the cons of my surrogate key id far outweighed pros.

    – atkayla
    yesterday







  • 2





    "The recommendations that people make" are based on myths that go back 50 years, from the days where magnetic tapes required physical address pointers. It has become so widespread, that people rarely stop and think about it, and just take it for granted. It doesn't apply to your case only, it applies to EVERY relational database on the planet. Watch this video to get more answers: youtube.com/watch?v=r2MDDPDi_Pg

    – SQLRaptor
    yesterday












  • I really enjoyed your talk and learned a lot! I love how it's about this exact topic in great detail and Pro Artificial vs. Pro Natural, the 99.5% vs. the 0.5%, haha. I mean with those Pro Artifical (id, type) lookups, what is recommended as the "de-facto standard", I'm already seeing issues with just 10 profiles in my database during an alpha test. You just got a new convert. :) I will refactor my entire schema and report back with whether I think it was a good decision or not, although I'm sure it will be due to the obvious elimination of so many joins. Just need to see for myself!

    – atkayla
    yesterday






  • 2





    +1 for the clarification! :-)

    – Vérace
    yesterday













4












4








4







Option 4 - correct your schema design and stop using 'IDs' or whatever other surrogate key, the zodiac table is NOT a lookup table. it's a relation that holds all zodiac signs, and the way you tell the zodiac signs apart is their name, therefore, the key should be:



CREATE TABLE Zodiacs (Zodiac VARCHAR(11) PRIMARY KEY);


Use the Zodiac as the foreign key in your profile table, and now you don't need a join at all.
Do that to all the other one-to many as well, like relationship status, company, school etc. and you just saved yourself most of the joins.
For the many-to-many, like languages and countries, so the same, and you will now need only one join instead of two for each relation.
Give it a try and see how all your queries magically shrink by 90%, and perform orders of magnitude better. you will also explicitly maintain uniqueness and prevent logical duplicates, and you will also see how filters perform orders of magnitude better as the optimizer will be able to correctly evaluate distribution of the actual value, not some random ID from another table.



BTW - there are no fields in a SQL database, there are columns and rows only.
These are the counterparts of Tuples and attributes in the relational model



HTH






share|improve this answer















Option 4 - correct your schema design and stop using 'IDs' or whatever other surrogate key, the zodiac table is NOT a lookup table. it's a relation that holds all zodiac signs, and the way you tell the zodiac signs apart is their name, therefore, the key should be:



CREATE TABLE Zodiacs (Zodiac VARCHAR(11) PRIMARY KEY);


Use the Zodiac as the foreign key in your profile table, and now you don't need a join at all.
Do that to all the other one-to many as well, like relationship status, company, school etc. and you just saved yourself most of the joins.
For the many-to-many, like languages and countries, so the same, and you will now need only one join instead of two for each relation.
Give it a try and see how all your queries magically shrink by 90%, and perform orders of magnitude better. you will also explicitly maintain uniqueness and prevent logical duplicates, and you will also see how filters perform orders of magnitude better as the optimizer will be able to correctly evaluate distribution of the actual value, not some random ID from another table.



BTW - there are no fields in a SQL database, there are columns and rows only.
These are the counterparts of Tuples and attributes in the relational model



HTH







share|improve this answer














share|improve this answer



share|improve this answer








edited yesterday

























answered yesterday









SQLRaptorSQLRaptor

2,6011320




2,6011320












  • Oh man. Drawing it on the whiteboard, I clearly see how this would eliminate each of the joins to one, and 1 less from each of the joins to many. I can see how this would make it significantly faster/better/cleaner. Before I built this thing out, I read various questions and even posted one of my own to the topic of natural key vs. join surrogate key. The consensus seems to be to use that surrogate key id, but seeing this now before me it makes no sense to do that ridiculous amount of those extra joins.

    – atkayla
    yesterday












  • I was wondering if you could elaborate for me why the recommendations people make to use the surrogate id key don't really apply to my use case. I read the information and still made the wrong decision, so I want to make sure I really understand. - stackoverflow.com/questions/54844992/… - stackoverflow.com/questions/337503/… - stackoverflow.com/questions/63090/… It seems like the cons of my surrogate key id far outweighed pros.

    – atkayla
    yesterday







  • 2





    "The recommendations that people make" are based on myths that go back 50 years, from the days where magnetic tapes required physical address pointers. It has become so widespread, that people rarely stop and think about it, and just take it for granted. It doesn't apply to your case only, it applies to EVERY relational database on the planet. Watch this video to get more answers: youtube.com/watch?v=r2MDDPDi_Pg

    – SQLRaptor
    yesterday












  • I really enjoyed your talk and learned a lot! I love how it's about this exact topic in great detail and Pro Artificial vs. Pro Natural, the 99.5% vs. the 0.5%, haha. I mean with those Pro Artifical (id, type) lookups, what is recommended as the "de-facto standard", I'm already seeing issues with just 10 profiles in my database during an alpha test. You just got a new convert. :) I will refactor my entire schema and report back with whether I think it was a good decision or not, although I'm sure it will be due to the obvious elimination of so many joins. Just need to see for myself!

    – atkayla
    yesterday






  • 2





    +1 for the clarification! :-)

    – Vérace
    yesterday

















  • Oh man. Drawing it on the whiteboard, I clearly see how this would eliminate each of the joins to one, and 1 less from each of the joins to many. I can see how this would make it significantly faster/better/cleaner. Before I built this thing out, I read various questions and even posted one of my own to the topic of natural key vs. join surrogate key. The consensus seems to be to use that surrogate key id, but seeing this now before me it makes no sense to do that ridiculous amount of those extra joins.

    – atkayla
    yesterday












  • I was wondering if you could elaborate for me why the recommendations people make to use the surrogate id key don't really apply to my use case. I read the information and still made the wrong decision, so I want to make sure I really understand. - stackoverflow.com/questions/54844992/… - stackoverflow.com/questions/337503/… - stackoverflow.com/questions/63090/… It seems like the cons of my surrogate key id far outweighed pros.

    – atkayla
    yesterday







  • 2





    "The recommendations that people make" are based on myths that go back 50 years, from the days where magnetic tapes required physical address pointers. It has become so widespread, that people rarely stop and think about it, and just take it for granted. It doesn't apply to your case only, it applies to EVERY relational database on the planet. Watch this video to get more answers: youtube.com/watch?v=r2MDDPDi_Pg

    – SQLRaptor
    yesterday












  • I really enjoyed your talk and learned a lot! I love how it's about this exact topic in great detail and Pro Artificial vs. Pro Natural, the 99.5% vs. the 0.5%, haha. I mean with those Pro Artifical (id, type) lookups, what is recommended as the "de-facto standard", I'm already seeing issues with just 10 profiles in my database during an alpha test. You just got a new convert. :) I will refactor my entire schema and report back with whether I think it was a good decision or not, although I'm sure it will be due to the obvious elimination of so many joins. Just need to see for myself!

    – atkayla
    yesterday






  • 2





    +1 for the clarification! :-)

    – Vérace
    yesterday
















Oh man. Drawing it on the whiteboard, I clearly see how this would eliminate each of the joins to one, and 1 less from each of the joins to many. I can see how this would make it significantly faster/better/cleaner. Before I built this thing out, I read various questions and even posted one of my own to the topic of natural key vs. join surrogate key. The consensus seems to be to use that surrogate key id, but seeing this now before me it makes no sense to do that ridiculous amount of those extra joins.

– atkayla
yesterday






Oh man. Drawing it on the whiteboard, I clearly see how this would eliminate each of the joins to one, and 1 less from each of the joins to many. I can see how this would make it significantly faster/better/cleaner. Before I built this thing out, I read various questions and even posted one of my own to the topic of natural key vs. join surrogate key. The consensus seems to be to use that surrogate key id, but seeing this now before me it makes no sense to do that ridiculous amount of those extra joins.

– atkayla
yesterday














I was wondering if you could elaborate for me why the recommendations people make to use the surrogate id key don't really apply to my use case. I read the information and still made the wrong decision, so I want to make sure I really understand. - stackoverflow.com/questions/54844992/… - stackoverflow.com/questions/337503/… - stackoverflow.com/questions/63090/… It seems like the cons of my surrogate key id far outweighed pros.

– atkayla
yesterday






I was wondering if you could elaborate for me why the recommendations people make to use the surrogate id key don't really apply to my use case. I read the information and still made the wrong decision, so I want to make sure I really understand. - stackoverflow.com/questions/54844992/… - stackoverflow.com/questions/337503/… - stackoverflow.com/questions/63090/… It seems like the cons of my surrogate key id far outweighed pros.

– atkayla
yesterday





2




2





"The recommendations that people make" are based on myths that go back 50 years, from the days where magnetic tapes required physical address pointers. It has become so widespread, that people rarely stop and think about it, and just take it for granted. It doesn't apply to your case only, it applies to EVERY relational database on the planet. Watch this video to get more answers: youtube.com/watch?v=r2MDDPDi_Pg

– SQLRaptor
yesterday






"The recommendations that people make" are based on myths that go back 50 years, from the days where magnetic tapes required physical address pointers. It has become so widespread, that people rarely stop and think about it, and just take it for granted. It doesn't apply to your case only, it applies to EVERY relational database on the planet. Watch this video to get more answers: youtube.com/watch?v=r2MDDPDi_Pg

– SQLRaptor
yesterday














I really enjoyed your talk and learned a lot! I love how it's about this exact topic in great detail and Pro Artificial vs. Pro Natural, the 99.5% vs. the 0.5%, haha. I mean with those Pro Artifical (id, type) lookups, what is recommended as the "de-facto standard", I'm already seeing issues with just 10 profiles in my database during an alpha test. You just got a new convert. :) I will refactor my entire schema and report back with whether I think it was a good decision or not, although I'm sure it will be due to the obvious elimination of so many joins. Just need to see for myself!

– atkayla
yesterday





I really enjoyed your talk and learned a lot! I love how it's about this exact topic in great detail and Pro Artificial vs. Pro Natural, the 99.5% vs. the 0.5%, haha. I mean with those Pro Artifical (id, type) lookups, what is recommended as the "de-facto standard", I'm already seeing issues with just 10 profiles in my database during an alpha test. You just got a new convert. :) I will refactor my entire schema and report back with whether I think it was a good decision or not, although I'm sure it will be due to the obvious elimination of so many joins. Just need to see for myself!

– atkayla
yesterday




2




2





+1 for the clarification! :-)

– Vérace
yesterday





+1 for the clarification! :-)

– Vérace
yesterday













1














Natural key, surrogate key? Each has advantages. Depends on the use case. The decision should be based on reason, not on ideology.



If natural keys are getting longer, surrogate keys start to look better. Size matters, especially if resources are limited and tables grow big. Small tables are fast anyway (unless you seriously mess up).



If you have limited RAM (you mentioned 3.75GB in a comment) and many rows in the main table, the sheer size starts to weigh in. Storing varchar(50) 1M times means up to 50+ MB on disk (and a bit more in RAM). If you have 100 columns like that, your rows grow huge and performance suffers. Replace it with a 1, 2 or 4 byte surrogate key to make that 1 - 4 MB per column instead.



The zodiac signs in your example table are short, descriptive and immutable. Perfect candidates for a natural key. Not much to gain with a surrogate key. The immutable nature of zodiac signs and the absence of any additional attributes (columns) also make it a perfect candidate for an enum type, which is simple to handle, just as fast and only occupies 4 bytes on disk.



I prepared a quick demo to compare:



Comparing 3 variants for your simple "zodiac" case:



  • Surrogate keys

  • Natural keys

  • Enums

db<>fiddle here



Enums seem like the best choice in this particular case. I am not a big fan of enums in general, as they make it harder to add/remove rows and impossible to add more columns to the lookup values.



Most probably there are other problems in your query, but it's too big to represent a question. Would be an extended code audit to go in deep.



About the 1-byte "char" type I used in the demo:



  • How to store one-byte integer in PostgreSQL?





share|improve this answer




















  • 1





    It is true that natural keys MIGHT take up more storage, but that is only one aspect of the overall impact they have on performance, and looking just at any one aspect, may lead to wrong conclusions. For example, regarding memory consumption - when using surrogates, forcing a join, memory has to load pages from both tables, consuming significantly more memory space than loading pages from just one table.

    – SQLRaptor
    12 hours ago











  • Small lookup tables factor in as a single data page and hardly matter in that respect in connection with big tables. That calculation changes with tables of comparable size / cardinality, but there you typically need to join in any case for additional columns.

    – Erwin Brandstetter
    12 hours ago















1














Natural key, surrogate key? Each has advantages. Depends on the use case. The decision should be based on reason, not on ideology.



If natural keys are getting longer, surrogate keys start to look better. Size matters, especially if resources are limited and tables grow big. Small tables are fast anyway (unless you seriously mess up).



If you have limited RAM (you mentioned 3.75GB in a comment) and many rows in the main table, the sheer size starts to weigh in. Storing varchar(50) 1M times means up to 50+ MB on disk (and a bit more in RAM). If you have 100 columns like that, your rows grow huge and performance suffers. Replace it with a 1, 2 or 4 byte surrogate key to make that 1 - 4 MB per column instead.



The zodiac signs in your example table are short, descriptive and immutable. Perfect candidates for a natural key. Not much to gain with a surrogate key. The immutable nature of zodiac signs and the absence of any additional attributes (columns) also make it a perfect candidate for an enum type, which is simple to handle, just as fast and only occupies 4 bytes on disk.



I prepared a quick demo to compare:



Comparing 3 variants for your simple "zodiac" case:



  • Surrogate keys

  • Natural keys

  • Enums

db<>fiddle here



Enums seem like the best choice in this particular case. I am not a big fan of enums in general, as they make it harder to add/remove rows and impossible to add more columns to the lookup values.



Most probably there are other problems in your query, but it's too big to represent a question. Would be an extended code audit to go in deep.



About the 1-byte "char" type I used in the demo:



  • How to store one-byte integer in PostgreSQL?





share|improve this answer




















  • 1





    It is true that natural keys MIGHT take up more storage, but that is only one aspect of the overall impact they have on performance, and looking just at any one aspect, may lead to wrong conclusions. For example, regarding memory consumption - when using surrogates, forcing a join, memory has to load pages from both tables, consuming significantly more memory space than loading pages from just one table.

    – SQLRaptor
    12 hours ago











  • Small lookup tables factor in as a single data page and hardly matter in that respect in connection with big tables. That calculation changes with tables of comparable size / cardinality, but there you typically need to join in any case for additional columns.

    – Erwin Brandstetter
    12 hours ago













1












1








1







Natural key, surrogate key? Each has advantages. Depends on the use case. The decision should be based on reason, not on ideology.



If natural keys are getting longer, surrogate keys start to look better. Size matters, especially if resources are limited and tables grow big. Small tables are fast anyway (unless you seriously mess up).



If you have limited RAM (you mentioned 3.75GB in a comment) and many rows in the main table, the sheer size starts to weigh in. Storing varchar(50) 1M times means up to 50+ MB on disk (and a bit more in RAM). If you have 100 columns like that, your rows grow huge and performance suffers. Replace it with a 1, 2 or 4 byte surrogate key to make that 1 - 4 MB per column instead.



The zodiac signs in your example table are short, descriptive and immutable. Perfect candidates for a natural key. Not much to gain with a surrogate key. The immutable nature of zodiac signs and the absence of any additional attributes (columns) also make it a perfect candidate for an enum type, which is simple to handle, just as fast and only occupies 4 bytes on disk.



I prepared a quick demo to compare:



Comparing 3 variants for your simple "zodiac" case:



  • Surrogate keys

  • Natural keys

  • Enums

db<>fiddle here



Enums seem like the best choice in this particular case. I am not a big fan of enums in general, as they make it harder to add/remove rows and impossible to add more columns to the lookup values.



Most probably there are other problems in your query, but it's too big to represent a question. Would be an extended code audit to go in deep.



About the 1-byte "char" type I used in the demo:



  • How to store one-byte integer in PostgreSQL?





share|improve this answer















Natural key, surrogate key? Each has advantages. Depends on the use case. The decision should be based on reason, not on ideology.



If natural keys are getting longer, surrogate keys start to look better. Size matters, especially if resources are limited and tables grow big. Small tables are fast anyway (unless you seriously mess up).



If you have limited RAM (you mentioned 3.75GB in a comment) and many rows in the main table, the sheer size starts to weigh in. Storing varchar(50) 1M times means up to 50+ MB on disk (and a bit more in RAM). If you have 100 columns like that, your rows grow huge and performance suffers. Replace it with a 1, 2 or 4 byte surrogate key to make that 1 - 4 MB per column instead.



The zodiac signs in your example table are short, descriptive and immutable. Perfect candidates for a natural key. Not much to gain with a surrogate key. The immutable nature of zodiac signs and the absence of any additional attributes (columns) also make it a perfect candidate for an enum type, which is simple to handle, just as fast and only occupies 4 bytes on disk.



I prepared a quick demo to compare:



Comparing 3 variants for your simple "zodiac" case:



  • Surrogate keys

  • Natural keys

  • Enums

db<>fiddle here



Enums seem like the best choice in this particular case. I am not a big fan of enums in general, as they make it harder to add/remove rows and impossible to add more columns to the lookup values.



Most probably there are other problems in your query, but it's too big to represent a question. Would be an extended code audit to go in deep.



About the 1-byte "char" type I used in the demo:



  • How to store one-byte integer in PostgreSQL?






share|improve this answer














share|improve this answer



share|improve this answer








edited 15 hours ago

























answered yesterday









Erwin BrandstetterErwin Brandstetter

94.8k9185299




94.8k9185299







  • 1





    It is true that natural keys MIGHT take up more storage, but that is only one aspect of the overall impact they have on performance, and looking just at any one aspect, may lead to wrong conclusions. For example, regarding memory consumption - when using surrogates, forcing a join, memory has to load pages from both tables, consuming significantly more memory space than loading pages from just one table.

    – SQLRaptor
    12 hours ago











  • Small lookup tables factor in as a single data page and hardly matter in that respect in connection with big tables. That calculation changes with tables of comparable size / cardinality, but there you typically need to join in any case for additional columns.

    – Erwin Brandstetter
    12 hours ago












  • 1





    It is true that natural keys MIGHT take up more storage, but that is only one aspect of the overall impact they have on performance, and looking just at any one aspect, may lead to wrong conclusions. For example, regarding memory consumption - when using surrogates, forcing a join, memory has to load pages from both tables, consuming significantly more memory space than loading pages from just one table.

    – SQLRaptor
    12 hours ago











  • Small lookup tables factor in as a single data page and hardly matter in that respect in connection with big tables. That calculation changes with tables of comparable size / cardinality, but there you typically need to join in any case for additional columns.

    – Erwin Brandstetter
    12 hours ago







1




1





It is true that natural keys MIGHT take up more storage, but that is only one aspect of the overall impact they have on performance, and looking just at any one aspect, may lead to wrong conclusions. For example, regarding memory consumption - when using surrogates, forcing a join, memory has to load pages from both tables, consuming significantly more memory space than loading pages from just one table.

– SQLRaptor
12 hours ago





It is true that natural keys MIGHT take up more storage, but that is only one aspect of the overall impact they have on performance, and looking just at any one aspect, may lead to wrong conclusions. For example, regarding memory consumption - when using surrogates, forcing a join, memory has to load pages from both tables, consuming significantly more memory space than loading pages from just one table.

– SQLRaptor
12 hours ago













Small lookup tables factor in as a single data page and hardly matter in that respect in connection with big tables. That calculation changes with tables of comparable size / cardinality, but there you typically need to join in any case for additional columns.

– Erwin Brandstetter
12 hours ago





Small lookup tables factor in as a single data page and hardly matter in that respect in connection with big tables. That calculation changes with tables of comparable size / cardinality, but there you typically need to join in any case for additional columns.

– Erwin Brandstetter
12 hours ago

















draft saved

draft discarded
















































Thanks for contributing an answer to Database Administrators Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f232952%2fjoining-40-normalized-tables-for-200-columns-to-display-detailed-user-profile%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Sum ergo cogito? 1 nng

419 nièngy_Soadمي 19bal1.5o_g

Queiggey Chernihivv 9NnOo i Zw X QqKk LpB