Cron configuration
Magento’s cron is triggered by your operation system’s cron. The generic cron dispatcher that comes with Magento checks the configuration to decide what specific task needs to be executed in Magento. These tasks come with a cron-like configuration as well. But the presence of a Magento task only won’t make it execute without the operating system’s cron.
Granularity
Magento’s cron script should be called every five minutes (or even every minute). As a rule of thumb it needs to be called at least as often as the task in Magento with the highest frequency to avoid those tasks to pile up and to be run all at once the next time the scheduler is triggered.
If you’re hosting provider doesn’t allow you to run custom cron jobs with this frequency it might not be a good choice for a hosting provider anyways :)
Scheduling configuration
In System > Configuration > System > Cron (Scheduled Tasks) you can configure the scheduling behavior. Magento creates a schedule on a regular basis and stores records in the cron_schedule table (with status “pending”). The cron dispatcher then processes the pending tasks from that table. Make sure your “Schedule Ahead for” value is bigger than “Generate Schedules Every” to avoid any gaps while processing the tasks and to reduce the risk of missing to schedule a task.
cron.php vs. cron.sh
In Magento’s root folder there’s a cron.php file and a cron.sh file. Briefly explained cron.sh internally calls cron.php and takes care of not executing more than one process in parallel. That’s a good thing, right? Well, sometimes… If you have long running tasks (e.g. a product import implemented as a task), these tasks will prevent anything else from being executed until the task has finished. You might have some other important tasks that shouldn’t wait or be skipped just because another task is running.
Check the “Missed if Not Run Within” setting to configure the maximum delay a task can have to still be executed.
On the other hand using cron.php will start a new process every time it’s being called which could easily result in performance problems or race conditions for tasks operating on the same data.
This is why cron.php should not be called directly. Find out more about cron groups later in this blog post for a nice solution on how to handle this situation.
So this is how to configure your crontab. Make sure it is run as the web server user!
$ sudo crontab -e -u www-data
* * * * * /bin/sh /var/www/magento/cron.sh
If you’re running your cron scheduler on the same machine your frontend runs on you might want to give the cron process a lower priority:
* * * * * <strong>nice -n 10 </strong>/bin/sh /var/www/magento/cron.sh
In order to avoid problems while deploying a new package or while doing maintenance you might want to check for the maintenance.flag before triggering cron.sh:
* * * * * <strong>! test -e /var/www/magento/maintenance.flag</strong> && nice -n 10 /bin/sh /var/www/magento/cron.sh
“Always” tasks
So if you already digged into the Magento cron stuff you might have noticed that Magento CE 1.8 and EE 1.13 introduced a new scheduling mode called “always” (instead of the cron syntax…). As the name says these tasks will unconditionally be executed every time cron is triggered and don’t need an explicitly defined schedules.
In the Enterprise Edition this is used to trigger the new changelog-based indexing. The Community Edition currently doesn’t seem to use this new feature. However, this feature is part of Mage_Cron and thus can be used for custom tasks in CE as well.
Looking at cron.php you’ll find the little mess that has been added to make this happen (also check this post). Basically cron.php being called without any parameters uses shell_exec to execute two processes of cron.sh. Each with a different parameter (“default” or “always”). Cron.sh in turn passes this parameter back to cron.php which then executes the cron. Internally Magento uses its event infrastructure to process the two modes by dispatching events with the vacuous names “default” and “always”. Mage_Cron implements two observer methods to do the actual magic: Mage_Cron_Model_Observer->dispatch() and Mage_Cron_Model_Observer->dispatchAlways()
Keeping this in mind I suggest simplifying the process and configuring cron like this instead (add “nice” and maintenance.flag check if required…)
* * * * * /bin/sh /var/www/magento/cron.sh –malways 1
* * * * * /bin/sh /var/www/magento/cron.sh –mdefault 1
Protect cron.php from outside access
The cron.php file is a php file that is intended to be run from command line but could also be triggered from the browser. Some sources even recommend triggering the cron scheduler by calling this script over http on a regular basis (again, if this is a workaround to your hosting not allowing you to support cron, you’re hoster is probably not a good fit in the first place).
Cron tasks potentially can run much longer than your maximum execution time or put some extra load on the server. Also maybe you have a dedicated worker server to process background tasks. This is why cron.php should blocked from outside access and cron tasks should not run in your webserver’s context.
Create your own task
Creating an own cron task is simple. Add following snippet to your module’s config.xml file:
<config>
[...]
<crontab>
<jobs>
<yourtaskname>
<schedule>
<cron_expr>*/5 * * * *</cron_expr>
</schedule>
<run>
<model>your_module/model::method</model>
</run>
</yourtaskname>
</jobs>
</crontab>
[...]
</config>
This is the simplest way of adding a cron job. Mage_Cron will pick this up from the xml configuration and start scheduling it according to your cron expression (<XXXXX; add link!>). The model you specified in the run->model node will be executed by Magento. The only parameter is the current instance of the schedule object (Mage_Cron_Model_Schedule). Now it’s your turn to implement whatever you want to do in with this task.
Although it’s easy to hardcode the cron expression I recommend always sticking to the second option Magento offers. Instead of having a cron_expr node you should add a config_path node within the schedule node. This one points to a value stored in the system configuration allowing you to define a default configuration and having this value configured individually:
<config>
[...]
<crontab>
<jobs>
<yourtaskname>
<schedule>
<config_path>your_module/your_section/cron_expr</config_path>
</schedule>
<run>
<model>your_module/model::method</model>
</run>
</yourtaskname>
</jobs>
</crontab>
<default>
<your_module>
<your_section>
<cron_expr>*/5 * * * *</cron_expr>
</your_section>
</your_module>
</default>
[...]
</config>
In your system.xml you could add a simple text field to have the cron expression configured directly. For a fancier interface with custom drop down fields use “adminhtml/system_config_source_cron_frequency”. (Check out Mage_Backup for an example on how to implement this)
Aoe_Scheduler
Magento’s build in cron scheduler is pretty simple and comes with some serious limitations if you’re trying to get things done efficiently or trying to find out what’s going on in the background. Check the Aoe_Scheduler (blog post/documentation: http://www.fabrizio-branca.de/magento-cron-scheduler.html, GitHub: https://github.com/fbrnc/Aoe_Scheduler) module for a lot of improvements:
- Backend, cli and web service access to all tasks
- Visual timeline
- Disabling tasks
- Better error, exception and return value handling.
- Events that allow custom workflows and dependencies between tasks
- Email notifications (on success or error)
- Heartbeat
- Process management (checkout the development branch for this experimental feature)
- Cron groups
- …and many more features
Cron groups
One of Aoe_Scheduler’s feature is introducing support for cron groups. That means you can run multiple cron.sh commands (on the same server or use this as a strategy to balance background processes across multiple servers) in parallel. Having a closer look at cron.sh you’ll see that it accepts an optional second parameter that defaults to “cron.php”. This parameter tells cron.sh which php script to execute and cron.sh’s check if a process is already running takes the script name into account. This way you can have a defined number of tasks in parallel and making sure that tasks that should not overlap won’t. Let’s say we want to execute three processes in parallel:- cron_always.php will process the “always” tasks
- cron_import.php will process your custom import tasks that might take some time
- cron_default.php will process all other tasks.
* * * * * /bin/sh /var/www/magento/cron.sh cron_always.php –malways 1
* * * * * /usr/bin/env SCHEDULER_WHITELIST=xx_import_products, xx_import_categories' /bin/sh /var/www/magento/cron.sh cron_import.php –mdefault 1
* * * * * /usr/bin/env SCHEDULER_BLACKLIST=xx_import_products, xx_import_categories' /bin/sh /var/www/magento/cron.sh cron_default.php –mdefault
The task execution now will look more like this (only showing the non-always processes…)