Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.3k views
in Technique[技术] by (71.8m points)

python - How/what DB should I use to store a queryable list in a database? Linked tables seems very excessive and slow?

I am working on a project that will have millions of primary keys (user IDs). Every user will have a list of attributes of unknown length (between 1 and 100 but generally < 5 in nearly all cases with ~1000 total possible attributes). This list needs to be query-able though to find users with the same attributes.

How would I accomplish this? One method seems to be to create a unique table for every user but this means I would have a LOT of tables which seems wrong. Other method used seems to be storing a list as a blob but this would make querying difficult.

I would prefer to use python but if my rather lightweight server struggles I may switch to c++. The only sql DB I have used so far is SQLite but this may not be ideal as it cannot handle enough commits per second (though I can queue).

What DB should I use and how should I do this properly?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

For this scenario i guess graph based databases or GDBs are good options.

You can define your attributes as nodes; one of the famous and powerful graph based databases is Neo4j, Neo4j doesn’t have tables. Neo4j uses Cypher (graph query language) to handle its query.

From Neo4j Website:

Unlike traditional databases, which arrange data in rows, columns and tables, Neo4j has a flexible structure defined by stored relationships between data records. With Neo4j, each data record, or node, stores direct pointers to all the nodes it’s connected to. Because Neo4j is designed around this simple, yet powerful optimization, it performs queries with complex connections orders of magnitude faster, and with more depth, than other databases.

From Neo4j website about Cypher:

With Neo4j, connections between data are stored – not computed at query time. Cypher is a powerful, graph-optimized query language that understands, and takes advantage of, these stored connections. When trying to find patterns or insights within data, Cypher queries are often much simpler and easier to write than massive SQL JOINs. Since Neo4j doesn’t have tables, there are no JOINs to worry about.

You can find some comparison with SQL in main page of their website: https://neo4j.com/

Check these links too if you want to use Neo4j in python:

  1. https://neo4j.com/developer/python/
  2. https://pypi.org/project/neo4j-driver/
  3. https://towardsdatascience.com/neo4j-cypher-python-7a919a372be7

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...