Adult Content Warning

This community may contain adult content that is not suitable for minors. By closing this dialog box or continuing to navigate this site, you certify that you are 18 years of age and consent to view adult content.

Databases and interfacing with them - DB Help Thread

Discussion in 'Technical Board' started by rei, Aug 16, 2011.

  1. rei

    rei
    Expand Collapse
    Emotionally Jaded

    Reputation:
    16
    Joined:
    Oct 19, 2009
    Messages:
    1,273
    Location:
    Guelph, ON
    I'm honestly not sure how used this would be - also it's really designed for the tech section, but this thread is meant as sort of both a "help me with this" and a general question and answer about database tech.
     
  2. Nettdata

    Nettdata
    Expand Collapse
    Mr. Toast

    Reputation:
    2,869
    Joined:
    Feb 14, 2006
    Messages:
    25,784
    We'll see if this flies.
     
  3. scootah

    scootah
    Expand Collapse
    New mod

    Reputation:
    12
    Joined:
    Oct 21, 2009
    Messages:
    1,750
    I'm an admin/ops guy - although I know my way around MSSQL pretty good, and can manage postgresql/mysql pretty comfortably. One of the servers I'm running at the moment hosts an Oracle instance which does some data processing - basically data comes in as part of a workflow, a bunch of transforms happen and data gets spat out at the other end. There's no need for any roll back, undo/redo or any kind of backup - the use of Oracle at all for the functions in question is something on the list for our developers to ditch and rebuild in a processing solution in C - but for the time being, we need to support Oracle.

    Currently, the oracle bit is performing like a bitch. On one of our dev machines, a task takes about 22 minutes. On our production environment - we're killing it at 2.5 hours. We can see that IO is part of the bottleneck (although it doesn't explain all of the fault) - but we're trying to tune to reduce that. I can see that the undo table space and redo logs are getting absolutely hammered. Well over 50% of disk usage occuring in the process is around the undo table space or the redo log files. And as mentioned - rollback/redo/backups/etc are completely irrelevant - if something goes sideways - the app or the QA catches the bad data, and we re run the processing step entirely, and the app rebuilds the database entirely if it's broken - so logging can be entirely disabled for our purposes. What's the best way to implement that?
     
  4. joule_thief

    joule_thief
    Expand Collapse
    Emotionally Jaded

    Reputation:
    62
    Joined:
    Mar 1, 2010
    Messages:
    604
    Location:
    Austin, TX
    I need to learn MySQL for work soon, but I'm just as interested in learning about databases in general. My initial use would be for an appliance that uses MySQL for reporting and I would also be doing database maintenance.

    What resources would be recommended for me to learn MySQL quickly? I tend to learn more quickly with a hands-on example type training, so that type would be preferred.

    In the long term, what other databases would it be to my advantage to learn?

    Thanks!
     
  5. Nettdata

    Nettdata
    Expand Collapse
    Mr. Toast

    Reputation:
    2,869
    Joined:
    Feb 14, 2006
    Messages:
    25,784
    You have two problems you need to solve: database development, and building an appliance.

    The second is much harder than the first.

    DBs are easy to learn. Start with the general SQL language that is, for the most part, shared by all relational databases. This is the core of what you need to learn in order to develop simple DB-based applications.

    <a class="postlink" href="http://shop.oreilly.com/product/9780596526849.do" onclick="window.open(this.href);return false;">http://shop.oreilly.com/product/9780596526849.do</a>

    That's a great book to figure out basic stuff.

    Now, if you are doing complex reporting, you might be touching on DB warehousing, which is fucking terrifying (to me, anyway). It all depends on your specific use cases, but if you're just doing simple shit, then you shouldn't have any problems. Pro-tip though, define all of your reports up front and reverse engineer them into the data model / sql queries you'll need, or you'll get lost pretty easily.

    Next, peruse the MySQL online docs, specifically around the features and administration stuff, to get an idea of the concepts. Don't focus on learning the low-level shit and how to implement it, skim it to get an idea of the concepts. "Oh... there are replication options. Oh, partitioned tables, what are they?" Focus on finding the "gotchas"... they're pretty good about explaining situations where things WON'T work as expected, so pay attention, and steer clear of those situations.

    Then Google for "MySQL DBA best practices" to get an idea of the type of ongoing, periodic maintenance you need to do to keep a DB running, as well as the type of monitoring you need to put in place to alert you when something isn't working. This is where you're getting into "building an appliance".

    Of course, I'm assuming you really are building an appliance, in the true sense of the word...


    The quickest way you can learn a DB is to have a specific project... define the requirements up front, and get a good understanding of what you'll need the DB to do. Even a rough idea of the data model will help. Then keep that shit in mind while you're investigating DB's, so you can make note of when something will apply to what you're doing.

    As to what DB's after MySQL, I'd say PostgreSQL... it's so much better than MySQL, and is much closer to commercial offerings, like Oracle. Learn that, and your skills will transfer quite well.

    MySQL isn't a good database, it's just EVERYWHERE. Most people don't realize that MySQL isn't even fully ACID compliant, just the default storage engine (innodb) is... the actual DBMS isn't. (MySQL has two main parts, the DB Management System, or "head", and the storage system, of which there are 10 or so main options... InnoDB is the default). It's nice and fast and simple for a few use cases, but I wouldn't build a bank around it.
     
  6. joule_thief

    joule_thief
    Expand Collapse
    Emotionally Jaded

    Reputation:
    62
    Joined:
    Mar 1, 2010
    Messages:
    604
    Location:
    Austin, TX
    thank you, i appreciate your response. now to get to work.
     
  7. scootah

    scootah
    Expand Collapse
    New mod

    Reputation:
    12
    Joined:
    Oct 21, 2009
    Messages:
    1,750
    if you can code a bit, learning to write sql queries is honestly pretty easy - it's just about learning the language that query structures use, and then being logical. learning schema design and database design and management/optimization is much more complicated.

    if all you need to do is write some queries to get data out, and performance isn't a particular issue? find any sql query tutorial, they're all basically applicable and when you find syntax doesn't work for some reason, find one specific to the database that you're using to check.

    on a decently spec'd appliance, without huge data sets, it's hard to go wrong at the start. when your dataset starts to grow, or if you need to work the database harder - your schema and query design becomes crucial to performance. if you have to design the data structure as well as manage the reports, really look into that area first. normalization and designing your db ahead of time to tune your reporting and management tasks saves a lot of pain.

    also, i completely agree with nett that mysql isn't a good database. i'd go further than that to say that it's the option you should only use if you don't have a choice, or if you don't really care because your db isn't going to do any real work. personally, i'd look into swapping to postgresql or microsoft sql express edition. the tools available for actually working with them are in my experience vastly superior, and i find products built around them are much, much more managable. oracle is also a great product, and i'm a bit out of date with their licensing, but to my knowledge, there probably isn't a suitable licensing arrangement that would slot into a business case that was built for mysql
     
  8. Nettdata

    Nettdata
    Expand Collapse
    Mr. Toast

    Reputation:
    2,869
    Joined:
    Feb 14, 2006
    Messages:
    25,784
    which is why they bought mysql.
     
  9. Binary

    Binary
    Expand Collapse
    Emotionally Jaded

    Reputation:
    388
    Joined:
    Oct 21, 2009
    Messages:
    4,079
    I'm trying to figure out the most efficient way to duplicate a SQL database using remote mysql commands.

    I'm regularly standing up and tearing down test instance for my website, and since the database + file directories total nearly 10gb, this can take some time and I'd like to just be able to run a script and walk away. I have the script built but the DB copy is slooow.

    Previously, I've been using PHPMyAdmin, running on the web server, to duplicate the database. I have no idea what query it uses to do the duplication, but it seems to be adequately fast. When I run this query from the web server:
    mysqldump -h db01 -u root -pPASSWORD source_database | mysql -h db01 -u root -pPASSWORD destination_database

    It is functional but far slower and more impactful to the functionality of the website than whatever PHPMyAdmin does.

    Thoughts?
     
  10. Nettdata

    Nettdata
    Expand Collapse
    Mr. Toast

    Reputation:
    2,869
    Joined:
    Feb 14, 2006
    Messages:
    25,784
    Look at using binary logs, especially if you want it to be current. It's an incremental backup process designed to keep standby DB's up to date with as minimal overhead as possible.

    http://dev.mysql.com/doc/refman/5.0/en/ ... overy.html

    Basically, do the following:

    • take a data file backup of your source db
    • copy the big files to your target and recover DB there. This is done once a week/month/whatever depending on your DB's rate of change, time to copy, and other elements that are specific to your environment and use case. You have to figure out the ROI and adjust accordingly.
    • enable binary logs on the source DB, which record changes to your DB since the base backup has been made
    • when you want to do a copy of the DB to your target, just copy those binary logs to your target and apply to your target
    • redo process every so often when binary logs get too big to copy quickly
     
  11. Nettdata

    Nettdata
    Expand Collapse
    Mr. Toast

    Reputation:
    2,869
    Joined:
    Feb 14, 2006
    Messages:
    25,784
    To further expand on this, you could do the following:

    Have a local/test/target DB that is a duplicate (logically) of PROD, and is kept up to date with reasonable frequency using the binary log approach.

    When you need to take a copy, treat that replica as the source, not your PROD (assuming PROD is off over the internet, and your replica is local drive).

    When you need a new TEST db to play with, make a local data-file copy of the 10GB DB, which is way, way faster than doing mysqldump/restore (which basically just executes a shit-ton of sql statements, and does index maintenance after each one unless you specify "no indexes" when you're dumping it).

    You'll just have to figure out the DB admin stuff (getting the DB to be usable under a new instance name) which is rather fast compared to running 10GB worth of SQL statements.


    The only real issue is the added management on the PROD db, which means you have to then be worried about cleaning up / managing the binary logs, taking a new base every so often, etc.)

    Would still be way, way faster.
     
  12. Binary

    Binary
    Expand Collapse
    Emotionally Jaded

    Reputation:
    388
    Joined:
    Oct 21, 2009
    Messages:
    4,079
    I had thought about using binary logs but to be honest, the servers are kind of on the hairy edge of being acceptably performant as it is and the company that owns them is basically unwilling to do upgrades. While binary logging isn't an enormous hit, I'm a little leery of doing anything to upset the balance, especially to solve a relatively minor convenience problem.

    The DB itself is only ~4gb, the rest are filesystem items that need to be duplicated and obviously that's the easy part. I'm just trying to figure out why I can do a database duplication in PHPMyAdmin in about 10 minutes with seemingly little impact on the website, whereas the SQL dump command basically brings the website down to molasses and sits there and spins for 30 minutes. My assumption is that's because the mysqldump command I'm using is doing the transfer of the DB twice (dump from DB > web server where piped command is running > back to new DB). I'm speculating that PMA avoids that redundant transfer and keeps the data local to the DB server... I just don't actually know how, since PMA is hosted remotely as well and has no access to the filesystem of the DB server.
     
  13. Nettdata

    Nettdata
    Expand Collapse
    Mr. Toast

    Reputation:
    2,869
    Joined:
    Feb 14, 2006
    Messages:
    25,784
    It's the mysql command... whenever you do raw sql commands to rebuild the server, especially without disabling keys, it's expensive as hell. Odds are phpmyadmin is using --disable-keys option (on the dump part) to make the insert much faster, which you're not doing.

    If they're using commercial mysql, then do a hot backup and copy the raw files, not process the sql statements.
     
  14. Binary

    Binary
    Expand Collapse
    Emotionally Jaded

    Reputation:
    388
    Joined:
    Oct 21, 2009
    Messages:
    4,079
    Didn't think about disabling keys - that's a good idea, thanks.

    They're not running a hot backup because it's just a vanilla install of free MySQL.
     
  15. $100T2

    $100T2
    Expand Collapse
    Emotionally Jaded

    Reputation:
    108
    Joined:
    Oct 19, 2009
    Messages:
    1,966
    Here's a really stupid question:

    I'm attempting to build a user database where you will log in with a username (no password necessary) and it will return user specific options which will mostly be html/jquery. I'm currently doing everything on my laptop using Dreamweaver and Wampserver.

    I'm assuming I will do this with php and mysql. The problem is, I don't know if I need to learn the mysql stuff first, the php stuff first, learn them together, or just fuck around with it til it works. I've tried the last option and I'm getting frustrated.

    Should I focus on one thing first?
     
  16. Nettdata

    Nettdata
    Expand Collapse
    Mr. Toast

    Reputation:
    2,869
    Joined:
    Feb 14, 2006
    Messages:
    25,784
    Focus on simple "hello world" php code that does basic CRUD actions on a single table. There are tons of tutorials out there that do that.

    Here's the first of thousands that popped up in Google: http://www.startutorial.com/articles/vi ... ial-part-1

    Work your way through that, and then see where you need to go from there to get it to work for your specific use case.

    Then start learning about blobs, clobs, exception handling, transactions, sql injections, etc., and wonder what the fuck you've gotten yourself into.

    Always better to start with a simple Proof Of Concept and then slowly morph it into what you need.
     
  17. $100T2

    $100T2
    Expand Collapse
    Emotionally Jaded

    Reputation:
    108
    Joined:
    Oct 19, 2009
    Messages:
    1,966
  18. Nettdata

    Nettdata
    Expand Collapse
    Mr. Toast

    Reputation:
    2,869
    Joined:
    Feb 14, 2006
    Messages:
    25,784
    Depending on where the scope of the project and where your database requirements lead you, you may want to consider using a noSQL database like Mongo for something like this. They are designed to be a simple key/value data store... no transactions, no joins, etc. Very fast, very simple... as long as you stay within the scope of what it does.

    http://php.net/manual/en/mongo.tutorial.php

    Be careful, though. I'm in the middle of converting 2 years of development with Mongo to PostgreSQL due to the fact that they started off using Mongo for something very lightweight and perfect for what it does, only to have the application morph into something that uses joins, transactions, and other things that Mongo doesn't do. It's hell.

    But we're still doing all of our BI instrumentation and metric collection using Mongo.

    It's about using the right tool for the right job.
     
  19. scootah

    scootah
    Expand Collapse
    New mod

    Reputation:
    12
    Joined:
    Oct 21, 2009
    Messages:
    1,750
    I'm a big fan of the NO in NoSQL standing for Not Only SQL - Using a NoSQL solution like Mongo in parallel with Postgre or some other real database can give really great performance options - but very very few real world development projects survive their full lifecycle in pure NoSQL - a well demonstrated solid understanding of when to use which product and how to implement both of them in useful ways is also worth a lot of money in the job market.
     
  20. Nettdata

    Nettdata
    Expand Collapse
    Mr. Toast

    Reputation:
    2,869
    Joined:
    Feb 14, 2006
    Messages:
    25,784
    Tell me about it. I knew the problems they were having 30 seconds into the interview when they told me their technology stack.

    They looked at me like I was some kind of psychic.