Tag Archives: script

Daily newspaper download with curl

Today I was reading about the election of Stefano Zacchiroli as Project Leader of the Debian project. Since his surname sounded pretty italian I did some research about him and happily found that he really is italian and he also studied at the University of Bologna just like me (though he graduated few years before). I personally never met him even if he was a researcher there in the same years I were attending. I’m very happy for him and proud that an italian reached such a position.

Reading his blog I then found something I should have written about months ago, after the newspaper Il Fatto has gone published. Just like Stefano, I wrote a script to download the newspaper on a daily base. I’m making it public so that anyone who subcribed to Il Fatto can use it. You can download it from here.

Create a file called .ilfattorc in your home with your credentials:

username="USERNAME"
password="PASSWORD"

Sobstitute USERNAME and PASSWORD with yours, of course.

The script is made of two files, one written in Bash, the other in Ruby. Save them in the folder you want the pdfs to get downloaded. It uses the curl tool for the HTTP requests. The Ruby part calculates a list of dates starting from a given one, up to the current and prints them in the required format. Basically the script downloads the pdfs of the newspaper for every day since the day of the last downloaded pdf and up to the current date.

The general steps to authenticate against the web server with curl are the following:

  1. get the login page and save cookies
  2. use the saved cookies to submit username and password along with other login parameters

Once you are authenticated, that is you have all the necessary cookies, you will simply have to send a request to the download url and save the output content.

How these steps are implemented is very specific for each case and I suggest to read the source code to understand them in the case of Il Fatto. If you’re trying to do something similar for other services I suggest you to first clarify how the whole procedure works paying particular attention to cookies, redirects and submitted parameters in POST calls. To do this I would consider using Firebug and Firecookie Firefox plugins.

If you are as lazy as me and want the newspaper to be downloaded every day automatically, then configure Anacron.
Edit your user crontab (with crontab -e) and enter this content (adjust paths according to your environment):

# m h  dom mon dow   command
25 * * * * /usr/sbin/anacron -t /home/fabio/.anacrontab -S /home/fabio/.anacronspool

This will run anacron at the 25th minute of every hour.
The create the .anacrontab file and the .anacronspool directory under your home folder. The content of .anacrontab will be something like (adjust paths according to your environment):

1   0   ilfatto.daily   /home/fabio/Desktop/ilfatto/download.sh

This statement will ensure that the download script will be called just one time per day.

Have a nice reading and good luck to Stefano Zacchiroli.

Update

The scripts have been updated to work with the new site of Il Fatto Quotidiano.

Sudo like tool for Alfresco – security aspects

In my first post in this blog I proposed a way to execute some javascript code with the admin privileges within the Alfresco (web)scripts.
As Peter Monks pointed out in his comment, there’re some risks concerning security you’d better be aware of if you intend to use this extension in your projects.
As Peter suggested, if users can author their own scripts then they can potentially submit code that runs with administrator privileges, which is an obvious security flow.
Also, attention must be paid in case the eval statement is used within the sudo argument function: avoid this kind of practice if the eval argument itself depends on some webscript input parameter since this could potentially lead to code injection. So how to cope with these problems?
My solution is to create a “sudoers” group (as in the Unix OSs) so that only users that belong to this group can execute the sudo function. Here is how I would change the Sudo bean:

public class Sudo extends BaseScopableProcessorExtension {
    private AuthorityService authorityService;

    public void sudo(final Function func) throws Exception  {
        final Context cx = Context.getCurrentContext();
        final Scriptable scope = getScope();
        String user = AuthenticationUtil.getRunAsUser();

        Set<String> groups = authorityService.getContainingAuthorities(AuthorityType.GROUP, user, false);
        if (!groups.contains("GROUP_SUDOERS"))
            throw new Exception("User '" + user + "' cannot use sudo");

        RunAsWork<Object> raw = new RunAsWork<Object>() {
            public Object doWork() throws Exception {
                func.call(cx, scope, scope, new Object[] {});
                return null;
            }
        };

        AuthenticationUtil.runAs(raw, AuthenticationUtil.getAdminUserName());
    }
}

We used the authorityService service to get the set of groups the current user belongs to and then we checked that the SUDOERS group is one of those. If you use this version of the Sudo bean, remember to update the Spring bean definition (file sudo-script-services-context.xml):

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>
<beans>
    <bean id="Sudo" parent="baseJavaScriptExtension" class="eu.fabiostrozzi.sudo.ws.js.Sudo">
        <property name="extensionName">
            <value>sudoUtils</value>
        </property>
       <property name="authorityService">
             <ref bean="AuthorityService" />
       </property>
    </bean>
</beans>

This is by no means a fully fledged solution but surely reduces risks if, for instance, users that can author scripts are not added to the SUDOERS group.