mysql

All posts tagged mysql

A Visual Guide to SQL Joins

Your SQL database can be huge.  So many tables, so much data, combining it can be hard.  Even once you’ve sorted out what data you want it can be a real task to visualize what joins you need so that you only grab the data you want.  The more efficient the join the better.  I found this nifty image on SQL joins that I think does a great job at showing how the various joins work and does a great job explaining how to use them.

SQL Joins

I hope this helps someone else as I know I am always having a hard time figuring out which joins to use and when.

Django works wonderfully as a web framework but sufferes from sub-par data migration.  Luckily for all of us one of their developers has created an application called South that fixes this very problem.  South is meant to bring data migrations to Django and make it easy to make schema changes that are quickly applied to your database without much effort on your part.  Django does have a few built in features for dropping tables, creating tables, and will give you the code to enter to create the tables but it doesn’t help much when you are working on a production server that you don’t want to simply drop and readd tables all the time.

To get started just go through the installation tutorial.  Once you’ve done that you’re ready to migrate your data!  This is best done from the very creation of the project so South can help you from the beginning.  If you are starting a new project go here and follow along.  I’m going to focus a bit more on converting an already created project and making updates and changes to your projects.

Now that South is installed on your already created project you’ve been using for a while you’ll need to convert your app so South can interact with it.  In your console just enter the commands:

Done!  Now your app is ready to go with South.  South keeps track of the migrations using a few tables in your database which is why you need to syncdb before converting any of your apps.  After you have converted your apps you’ll notice you have some new .py files that will need to be uploaded to your repositories.  When South tries to do a migration it makes sure that you have all of the .py files that it has listed in its database for that app.  After the initial conversion, and you’ve uploaded the files, all other machines with the installation will need to run the command ./manage.py migrate myapp 0001 –fake to ensure everyone starts on the same page.  The –fake command at the end tells South to take note of what is happening but don’t actually do it because it’s already been done.

When you make changes to your models, from now on, you’ll need to use South to do the syncdb instead of Django.  Let’s say you add a few columns to a table, remove some others, rename even more, and add a bunch of new tables.  To have South take care of all of that just use the commands:

After running the first command South will try to document all of the changes, display them to the screen, and if all goes well it will tell you to run the second. When running the second command it will actually apply the changes to the database. That’s all there is to it. Occasionally, something will go wrong. If a problem occurs at the first command it’s most likely because you have a problem in your code that is making it fail to compile. If it fails in the second command there are more issues. You could have a partial migration which they will give you code to try and clean up but I’ve found it’s best to do it yourself as their solution is usually to drop a lot of tables. It’s kind of a headache when that happens and has turned off a few of us from South but for the most part we have had a lot of success with it. If you do make all of the changes by hand just be sure and tack on the –fake at the end of the second command to let South know that the changes were already made and that they shouldn’t try and do them again. Be sure and always commit your .py South files because even if I am in a different app trying to use South I will need all of the South .py files from all of the apps before the sync can happen.

Hope this helps a few people and explains a few things about the nuances of South. It really is a great tool and they are even trying to work it into the default installation of Django in the near future so that will be exciting. If you have any tips or tricks when using South post them in the comments below.

A big request we had in designing our new system was an easier way for users to get data from the database.  With the large amounts of data that we use across dozens of tables it can be slow going to use the Django methods.  Instead they have given the ability to use and write raw SQL and use that result how you please.  You can read about using cursors here, I may do a post in the future about them but today I want to focus on the SQL we wrote and the problems we encountered.

The requested query was to get a list of customers, how much business we did with them, and break that up by year.  The year could be inputted by the user so it needed to be dynamic enough to scale.  The first way we tried was to create a temporary table for every year needed.  Each table was created from a SELECT statement that was specified for the office as well as the year.  Below is the SQL command for one of the offices and years but you can imagine we copied this code, almost verbatim, but changed the SELECT statement to get the correct data.

Once all of these tables were created we had to do a ton of LEFT JOINs against the main office to get the split of the total between the other offices.  Needless to say this ran very slowly.  Using Python to change the %s to a year a typical loop through creating the tables, gathering the data, and returning took roughly 16 seconds.  Yeah, not good.  The good part was that it automatically put everything into one line for each client and one column for every office total and annual total.

Obviously we had to cut down on the time.  By changing how we did our whole process we can cut the time down significantly without using JOINs.  Now this may not work in perfectly raw SQL because we are saving the output to a variable and then combining them.  Below is the new SQL code again changing the first one for each office.  The difference here is that instead of doing a table and SELECT for every year we are doing one SELECT for all the years and creating a row for each annual total.  So requesting data 5 years back could give a possible five rows for each customer, one for every year.

After every select we save the output to a variable named for the office.  In Python these are saved as lists of dictionaries.  In the end we combine the lists and then create an output of the combined totals.  I wrote an algorithm to loop through the giant list and adding the data to an output dictionary that had only one entry for each client.

By changing the way that we create the tables we were able to cut that time to ~5 seconds.  It’s a lot of data so it takes some time but 1/3 the last time is a big improvement. Obviously this code won’t work in just SQL but it is useful for Django cursors.  If there is a way to accomplish what we are doing, in a very fast query I would totally be up for hearing about it in the comments.  I have altered our actual MySQL statements to be more readable.  In the end we just wanted a list of clients and the total work orders done for a year as well as the total for each office for that year.  Our solution works and we are fairly happy with it.  Always room for improvement though.

Anytime I need to move data or alter our database I get a little bit nervous that something is going to go wrong and crash the database losing all of our data, dropping tables, causing chaos, and so many phone calls.  Our database automatically backs up every so often but right before any major changes I make a quick local copy of the data.  This came in handy just the other day when a table was inadvertently dropped and I had a backup on hand.  Here’s how to do it using console commands.

Making a Backup of a Database

Restoring a Database from a Backup

Make sure the database is created before restoring.

Making a Backup of a Table in a Database

Restoring a Table from a Backup

Make sure the table has been created already and MySQL will know where to put the data.

MySQL error 1449: The user specified as a definer does not exist

To fix this problem use the following command when restoring your database.

If you rename columns you will get some weird results so make sure that the tables are the same before and after the backup. Let me know about experiences or tips you have with backing up and restoring MySQL databases.

MySQL Collation Matters

When creating databases and tables with MySQL there seems to be one setting that gets overlooked, collation and character set.  For most smaller sites, and maybe those purely in English, this may not be such a huge deal but for a larger application site and one that will be holding characters from various languages this is essential.

UTF-8 is the collation and character set we want and the MySQL command to create a database with these settings is CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;

Bringing this back to Django, when Django creates new tables within a database it will use whatever the default character set and collation is for that database.  This is why it is important to get this right at the beginning with the command above.  If you do not, however, there are ways to make future tables created have the right encoding.  Simply add these lines at the bottom of your database connection settings in your settings.py file:

I put mine right below the PORT variable in my default database connection.  This will ensure that all new tables created will have the correct collation and character set.  But what about those tables already created that need their settings changed to accommodate these special characters?  You can find out what collation the columns are using by entering this command for MySQL: SHOW FULL COLUMNS FROM my_table.  One of the columns shown is the Collation column.  Make sure this is some utf8 encoding and you’re good.  If it isn’t simply use the command ALTER TABLE my_table MODIFY column_name varchar(30) CHARACTER SET utf8; to change the column to the appropriate character set.  Make sure that the varchar(30) in the example is set appropriately to your column whether it be a text field, varchar(255), etc. make sure it is the same column it was before just with the new character set.  Run that command for every column that needs changing.  Run the SHOW FULL COLUMNS line again and make sure everything is right and that’s it!

Hope this helps someone, I know it has come up a lot in the project we are working on now and has become kind of a headache because we didn’t start the tables out correctly.

Custom Django Admin Filter with Foreign Keys

During our massive redesign of our internal tools I wanted to make the Django admin page more friendly for us, the software engineers and admins.  Currently when they have problems or need things changed the easiest way to do that is through raw SQL and altering the database directly.  This is inconvenient and can lead to bad data.  Django, by design, abstracts that interaction and creates a great user-interface to interact with including filters and search options.

One of the strengths of this is you can define your own custom filters that will appear as normal filters on the UI.  Defining them is quite simple and you can check out the tutorial directly from Django(Search for DecadeBornListFilter).  I wanted to take the query a step further and have it look up items based on a characteristic of a parent object stored as a foreign key.  Using the double underscore (__) Django allows you to access data from that foriegn key in queries.  Following the tutorial I was able to quickyl make my query.

It worked great, except when the query was set to All.  When set to All it should display every object for that queryset because there are no parameters but since I was using foreign key fields I guess it didn’t like that and would display nothing.  When you selected specific filters the query worked but on the default nothing appeared.  I searched all over the Internet and found nothing about it so I made up my own solution.

My solution is to check whether or not there is a value.  On All there is no value and should return the queryset unchanged.  Before it would try to filter and return nothing.  This simple test made it so my admin page has a lot more functionality and was a simple fix that I couldn’t find written anywhere.  Hopefully this helps someone else through the odd behavior that can happen sometimes.

Django Dynamic ModelForm Using Closure

At my job we are currently undergoing a total rewrite of their old web-based data entry system.  It’s written in Django and works pretty-alright we are just going to make it better.  When it was first written the company was small and very few tests were being done.  This caused the initial design to lack foresight into the growth of the project.  Another problem with this current system is that it was an almost direct port of their old Microsoft Access version they had been running before.  The company is international and a web based system is much better and easier for the employees to use but wasn’t designed with growth in mind.

With a better understanding of how the systemis being used and what the users wanted it to do we have been in the beginning phases of development for about three weeks now.  Parts of the site are ready to go live within a week but the major parts are left.

One obstacle that was brought up by one of the software engineers today was the idea of creating a custom generic form that could be reused in various ways throughout the site.  The trick was going to be with saving to a dynamic model.  Django has few great built-in ways of creating a form based on a model on our MySQL database.  The problem is there isn’t an easy way to pass it a model to base the form off of.  Theoretically, you should create a model form for each of your models regardless.  We felt like we could get more out of writing our own custom generic form though.

To accomplish this hurdle I came across closures.  I’d never heard of closures before and if you are in the mood the Wikipedia article does a fine job explaining it.  Essentially we were going to use a closure around our generic model form class to pass it any model we pleased.

Following this example from StackOverflow (another example) we came up with something like the following:

This worked out great.  We were able to instantly see the forms generated to our templates using this new way.  Unfortunately, we spent so long on this and just thinking about how to generate and save all of these forms that we went completely back to the drawing board with all of the models.  I’d rather have simple, easy-to-understand code than overly-generic, complex code.  I just found the whole idea of closures, which I believe are similar to lambda expressions which I have used before, very fascinating and we were able to use them, if even just for a moment, today in our daily tasks and creations.

-Eric