Scraping data from Magento without privileged access or trust Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?Securing add to cart http requestProduct updates via XML-RPC API not taking effectPublic programmatic access to catalogs of foreign shopsHow to grant Oauth access to API without cut-and-pasting the consumer key?3rd party access to Magento CE downloadable productsRestrict a controller HTTP Request by Cutomer email and password?Access module files from admin panelNot able to access the magento api installed from AMPPSExtension disappeared from backend without errorFirst steps of creating API integration with Magento2.3
Can gravitational waves pass through a black hole?
Are Flameskulls resistant to magical piercing damage?
Why does BitLocker not use RSA?
Why do people think Winterfell crypts is the safest place for women, children & old people?
How do I overlay a PNG over two videos (one video overlays another) in one command using FFmpeg?
Assertions In A Mock Callout Test
Are bags of holding fireproof?
Like totally amazing interchangeable sister outfit accessory swapping or whatever
xkeyval -- read keys from file
What is the evidence that custom checks in Northern Ireland are going to result in violence?
false 'Security alert' from Google - every login generates mails from 'no-reply@accounts.google.com'
Putting Ant-Man on house arrest
Is "ein Herz wie das meine" an antiquated or colloquial use of the possesive pronoun?
Protagonist's race is hidden - should I reveal it?
Why did Europeans not widely domesticate foxes?
Book about a teenager and alien
Why not use the yoke to control yaw, as well as pitch and roll?
Unix AIX passing variable and arguments to expect and spawn
Does GDPR cover the collection of data by websites that crawl the web and resell user data
Can I ask an author to send me his ebook?
Why isn't everyone flabbergasted about Bran's "gift"?
Why did Israel vote against lifting the American embargo on Cuba?
How to create a command for the "strange m" symbol in latex?
Network Switch Upgrade Planning questions
Scraping data from Magento without privileged access or trust
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Announcing the arrival of Valued Associate #679: Cesar Manara
Unicorn Meta Zoo #1: Why another podcast?Securing add to cart http requestProduct updates via XML-RPC API not taking effectPublic programmatic access to catalogs of foreign shopsHow to grant Oauth access to API without cut-and-pasting the consumer key?3rd party access to Magento CE downloadable productsRestrict a controller HTTP Request by Cutomer email and password?Access module files from admin panelNot able to access the magento api installed from AMPPSExtension disappeared from backend without errorFirst steps of creating API integration with Magento2.3
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
There appear to be a number of ways of scraping product data from a Magento site, but all seem to have their upsides and downsides.
We deal with sites who have little to no technical resource, but who have given us permission to scrape their product catalog. There appear to be 3 different ways of doing this, none of which really work:
- Manual web scraping - developer intensive, requires updating when the theme changes.
- Magento Web API - requires setting up an API user, too technical for many users.
- Magento Plugin - too technical for many users, exposes sensitive business data so many companies won't do this.
Are we missing something? Is there a better alternative, or are there ways of changing any of the above 3 to be better for scraping?
For example, is it possible to provide a link to a 'one-click-setup' like process for API access? Shopify do this in a nice way using OAuth and permission scopes, so we can give our partners a link that will give us read only access to just their product catalog, in a way that non-technical users can use.
api extensions
add a comment |
There appear to be a number of ways of scraping product data from a Magento site, but all seem to have their upsides and downsides.
We deal with sites who have little to no technical resource, but who have given us permission to scrape their product catalog. There appear to be 3 different ways of doing this, none of which really work:
- Manual web scraping - developer intensive, requires updating when the theme changes.
- Magento Web API - requires setting up an API user, too technical for many users.
- Magento Plugin - too technical for many users, exposes sensitive business data so many companies won't do this.
Are we missing something? Is there a better alternative, or are there ways of changing any of the above 3 to be better for scraping?
For example, is it possible to provide a link to a 'one-click-setup' like process for API access? Shopify do this in a nice way using OAuth and permission scopes, so we can give our partners a link that will give us read only access to just their product catalog, in a way that non-technical users can use.
api extensions
add a comment |
There appear to be a number of ways of scraping product data from a Magento site, but all seem to have their upsides and downsides.
We deal with sites who have little to no technical resource, but who have given us permission to scrape their product catalog. There appear to be 3 different ways of doing this, none of which really work:
- Manual web scraping - developer intensive, requires updating when the theme changes.
- Magento Web API - requires setting up an API user, too technical for many users.
- Magento Plugin - too technical for many users, exposes sensitive business data so many companies won't do this.
Are we missing something? Is there a better alternative, or are there ways of changing any of the above 3 to be better for scraping?
For example, is it possible to provide a link to a 'one-click-setup' like process for API access? Shopify do this in a nice way using OAuth and permission scopes, so we can give our partners a link that will give us read only access to just their product catalog, in a way that non-technical users can use.
api extensions
There appear to be a number of ways of scraping product data from a Magento site, but all seem to have their upsides and downsides.
We deal with sites who have little to no technical resource, but who have given us permission to scrape their product catalog. There appear to be 3 different ways of doing this, none of which really work:
- Manual web scraping - developer intensive, requires updating when the theme changes.
- Magento Web API - requires setting up an API user, too technical for many users.
- Magento Plugin - too technical for many users, exposes sensitive business data so many companies won't do this.
Are we missing something? Is there a better alternative, or are there ways of changing any of the above 3 to be better for scraping?
For example, is it possible to provide a link to a 'one-click-setup' like process for API access? Shopify do this in a nice way using OAuth and permission scopes, so we can give our partners a link that will give us read only access to just their product catalog, in a way that non-technical users can use.
api extensions
api extensions
asked Aug 17 '15 at 17:20
danpalmerdanpalmer
1213
1213
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Not sure why magento plugin in and of itself would be too technical, especially if instructed to install via magento connect.
Which could build an accessible XML feed for you so you could scrape/retrieve the feed via HTTP without worrying about a changing theme layer.
I don't think this is the one click answer you're looking for, but an 'alternative' solution could be to have clients upload a custom script that you provide.
That script could be run via cron, and would perform periodic dumps of specified DB tables (i.e. no tables which contain 'sensitive business data').
Each dump could be retrieved via ssh/sftp if you have access to that, a public facing folder / email if not. Setting up a crontask via cpanel would be pretty easy for the average user.
That would give you the most complete dataset, although not without its glaring downsides.
As a sidenote, xpath parser for webscraping is an elegant tool, and could be implemented in a way to be mostly theme agnostic if it comes to that.
Thanks for your reply! Unfortunately Magento Connect looks too complicated for some of our partners, they often use contractors to set up Magento, and aren't able to do things like this. Also it doesn't solve the permissions issue, that plugins can read anything they want. We already use XPath, sitemaps, and lots of other ways to scrape data from the pages, but Magento themes differ enough on the sites we already do this for that we can't share much if any scraping code between them.
– danpalmer
Aug 18 '15 at 8:37
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "479"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmagento.stackexchange.com%2fquestions%2f78946%2fscraping-data-from-magento-without-privileged-access-or-trust%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Not sure why magento plugin in and of itself would be too technical, especially if instructed to install via magento connect.
Which could build an accessible XML feed for you so you could scrape/retrieve the feed via HTTP without worrying about a changing theme layer.
I don't think this is the one click answer you're looking for, but an 'alternative' solution could be to have clients upload a custom script that you provide.
That script could be run via cron, and would perform periodic dumps of specified DB tables (i.e. no tables which contain 'sensitive business data').
Each dump could be retrieved via ssh/sftp if you have access to that, a public facing folder / email if not. Setting up a crontask via cpanel would be pretty easy for the average user.
That would give you the most complete dataset, although not without its glaring downsides.
As a sidenote, xpath parser for webscraping is an elegant tool, and could be implemented in a way to be mostly theme agnostic if it comes to that.
Thanks for your reply! Unfortunately Magento Connect looks too complicated for some of our partners, they often use contractors to set up Magento, and aren't able to do things like this. Also it doesn't solve the permissions issue, that plugins can read anything they want. We already use XPath, sitemaps, and lots of other ways to scrape data from the pages, but Magento themes differ enough on the sites we already do this for that we can't share much if any scraping code between them.
– danpalmer
Aug 18 '15 at 8:37
add a comment |
Not sure why magento plugin in and of itself would be too technical, especially if instructed to install via magento connect.
Which could build an accessible XML feed for you so you could scrape/retrieve the feed via HTTP without worrying about a changing theme layer.
I don't think this is the one click answer you're looking for, but an 'alternative' solution could be to have clients upload a custom script that you provide.
That script could be run via cron, and would perform periodic dumps of specified DB tables (i.e. no tables which contain 'sensitive business data').
Each dump could be retrieved via ssh/sftp if you have access to that, a public facing folder / email if not. Setting up a crontask via cpanel would be pretty easy for the average user.
That would give you the most complete dataset, although not without its glaring downsides.
As a sidenote, xpath parser for webscraping is an elegant tool, and could be implemented in a way to be mostly theme agnostic if it comes to that.
Thanks for your reply! Unfortunately Magento Connect looks too complicated for some of our partners, they often use contractors to set up Magento, and aren't able to do things like this. Also it doesn't solve the permissions issue, that plugins can read anything they want. We already use XPath, sitemaps, and lots of other ways to scrape data from the pages, but Magento themes differ enough on the sites we already do this for that we can't share much if any scraping code between them.
– danpalmer
Aug 18 '15 at 8:37
add a comment |
Not sure why magento plugin in and of itself would be too technical, especially if instructed to install via magento connect.
Which could build an accessible XML feed for you so you could scrape/retrieve the feed via HTTP without worrying about a changing theme layer.
I don't think this is the one click answer you're looking for, but an 'alternative' solution could be to have clients upload a custom script that you provide.
That script could be run via cron, and would perform periodic dumps of specified DB tables (i.e. no tables which contain 'sensitive business data').
Each dump could be retrieved via ssh/sftp if you have access to that, a public facing folder / email if not. Setting up a crontask via cpanel would be pretty easy for the average user.
That would give you the most complete dataset, although not without its glaring downsides.
As a sidenote, xpath parser for webscraping is an elegant tool, and could be implemented in a way to be mostly theme agnostic if it comes to that.
Not sure why magento plugin in and of itself would be too technical, especially if instructed to install via magento connect.
Which could build an accessible XML feed for you so you could scrape/retrieve the feed via HTTP without worrying about a changing theme layer.
I don't think this is the one click answer you're looking for, but an 'alternative' solution could be to have clients upload a custom script that you provide.
That script could be run via cron, and would perform periodic dumps of specified DB tables (i.e. no tables which contain 'sensitive business data').
Each dump could be retrieved via ssh/sftp if you have access to that, a public facing folder / email if not. Setting up a crontask via cpanel would be pretty easy for the average user.
That would give you the most complete dataset, although not without its glaring downsides.
As a sidenote, xpath parser for webscraping is an elegant tool, and could be implemented in a way to be mostly theme agnostic if it comes to that.
edited Nov 21 '17 at 13:07
Teja Bhagavan Kollepara
2,99241949
2,99241949
answered Aug 17 '15 at 18:20
bakubaku
1229
1229
Thanks for your reply! Unfortunately Magento Connect looks too complicated for some of our partners, they often use contractors to set up Magento, and aren't able to do things like this. Also it doesn't solve the permissions issue, that plugins can read anything they want. We already use XPath, sitemaps, and lots of other ways to scrape data from the pages, but Magento themes differ enough on the sites we already do this for that we can't share much if any scraping code between them.
– danpalmer
Aug 18 '15 at 8:37
add a comment |
Thanks for your reply! Unfortunately Magento Connect looks too complicated for some of our partners, they often use contractors to set up Magento, and aren't able to do things like this. Also it doesn't solve the permissions issue, that plugins can read anything they want. We already use XPath, sitemaps, and lots of other ways to scrape data from the pages, but Magento themes differ enough on the sites we already do this for that we can't share much if any scraping code between them.
– danpalmer
Aug 18 '15 at 8:37
Thanks for your reply! Unfortunately Magento Connect looks too complicated for some of our partners, they often use contractors to set up Magento, and aren't able to do things like this. Also it doesn't solve the permissions issue, that plugins can read anything they want. We already use XPath, sitemaps, and lots of other ways to scrape data from the pages, but Magento themes differ enough on the sites we already do this for that we can't share much if any scraping code between them.
– danpalmer
Aug 18 '15 at 8:37
Thanks for your reply! Unfortunately Magento Connect looks too complicated for some of our partners, they often use contractors to set up Magento, and aren't able to do things like this. Also it doesn't solve the permissions issue, that plugins can read anything they want. We already use XPath, sitemaps, and lots of other ways to scrape data from the pages, but Magento themes differ enough on the sites we already do this for that we can't share much if any scraping code between them.
– danpalmer
Aug 18 '15 at 8:37
add a comment |
Thanks for contributing an answer to Magento Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmagento.stackexchange.com%2fquestions%2f78946%2fscraping-data-from-magento-without-privileged-access-or-trust%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown